What Actually Happens to Your System When Traffic Suddenly Spikes

by Arif Ikhsanudin, Backend Developer

The Spike Arrives

Your product gets featured somewhere — a newsletter, a social post, a press article. Traffic goes from 50 requests per second to 800 in under two minutes. Your monitoring shows a response time increase, then a surge of 502 errors, then total unavailability. The spike lasts 20 minutes. The recovery takes 45 minutes. You never get those users back.

The question is not just "why did the system go down." The question is what specifically happened in those two minutes between the spike starting and the system falling over — because that sequence determines what you need to change.

The Failure Cascade

Traffic spikes produce failure through a cascade, not a single event. Understanding the cascade tells you where to intervene.

Stage 1: Thread pool saturation. Your application server has a fixed thread pool — typically 200 threads in a default Tomcat or similar configuration. Each incoming request consumes a thread. When requests arrive faster than they complete, the pool fills. New connections start queuing. Response times climb as requests wait for an available thread.

Stage 2: Database connection pool exhaustion. Each application thread holding a database connection holds it for the duration of the request. As threads queue up and requests slow, connections stay open longer. The database connection pool — typically 10–100 connections in most configurations — exhausts. Application threads now block waiting for a connection. This makes every request slower, which backs up threads further.

Stage 3: Timeout cascades. Requests that have been waiting too long hit their timeout. The client retries. Now you have the original traffic plus retry traffic hitting an already-saturated system. Retry storms are one of the most common causes of recovery taking longer than the original spike.

Stage 4: Memory pressure and GC pause. Queued requests are objects in memory. As the queue grows, heap usage climbs. The JVM — or equivalent runtime — spends increasing time in garbage collection. GC pauses stop the world momentarily. During a pause, no requests complete. When the pause ends, all queued responses attempt to complete simultaneously.

# What happens to a typical Java/Tomcat stack under spike:

t=0s:   50 req/s  | threads: 30/200 | db conns: 20/50
t=30s:  800 req/s | threads: 180/200 | db conns: 48/50  <- near limit
t=60s:  800 req/s | threads: 200/200 | db conns: 50/50  <- SATURATED
                  | queue depth: 400 requests waiting
                  | p50 latency: 4s (was 80ms)
t=90s:  800 req/s | incoming requests timeout-fail immediately
                  | retry storm begins
                  | GC pressure increasing
t=120s: OUTAGE    | 502s or connection refused

Where Each Intervention Fits

Connection pool sizing: Not as large as possible. A large database connection pool under spike can overwhelm the database itself, which has its own connection limit and query concurrency ceiling. The correct connection pool size is a function of your database's max_connections and how many application instances share it. Over-sizing shifts the bottleneck to the database without solving it.

Request timeouts: Set aggressive timeouts on all outbound calls — database queries, external service calls, cache operations. A request that times out at 500ms clears a thread. A request that waits 30 seconds for a database connection holds a thread for 30 seconds and contributes to the cascade. Circuit breakers (Resilience4j, Hystrix) wrap this in a policy: after N failures in a window, stop sending requests and fail fast.

Load shedding: When the system is saturated, the right behavior is to reject new requests with a 429 or 503 immediately rather than queue them. A fast rejection is recoverable. A queued request that waits 10 seconds and then fails has consumed thread time, connection time, and memory for no benefit. Rate limiting at the load balancer or API gateway level is the right place to implement this.

Autoscaling lag: Cloud autoscaling reacts to metrics — CPU, request rate — with a lag of 2–5 minutes for instance spin-up. Most traffic spikes are over before new instances are healthy. Autoscaling helps with sustained load growth, not sharp spikes. For spikes, you need headroom: run at 40–50% capacity rather than 80%, so there is room to absorb a spike while autoscaling responds.

The Design Implication

Systems that handle spikes gracefully have two properties: they fail fast rather than queue up, and they degrade gracefully rather than collapse completely. Failing fast means setting timeouts everywhere and refusing to hold threads waiting indefinitely. Degrading gracefully means identifying which features can return cached or approximate responses when the system is under pressure, so core functionality continues even when the expensive operations are shedding load.

Design for the spike before the spike arrives. The window between a spike starting and the cascade completing is measured in seconds to minutes. There is no time to intervene manually.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

New Zealand's Capital Has a Tech Talent Drain Problem — Async Remote Contractors Are the Practical Fix

Wellington keeps producing engineers it can't fully retain. Startups that understand this build around it rather than fight it.

Read more

How Smart Startups Use Timezone Differences as a Development Advantage

Most founders treat timezone gaps as a cost to manage. The ones moving fastest have figured out how to make them work in their favor.

Read more

Your Table Structure Is Making Your Queries Harder Than They Need to Be

Schema decisions that feel neutral at design time create query patterns that are unnecessarily complex, slow, or fragile — recognizing the structural mismatches between your schema and your access patterns is the first step to fixing them.

Read more

How to Decide What Skills Will Actually Get You More Work

Not every skill you learn brings more projects or higher pay. Here’s how to pick the ones that truly make you marketable.

Read more