What Nobody Tells You About Scaling a Backend System

February 25, 2026

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Scaling Mythology

Scaling has a mythology built around it in the industry. The myth is that it's primarily about infrastructure: more servers, bigger machines, better cloud architecture. The reality is that most systems fail to scale because of application-layer problems that no amount of horizontal scaling can fix. More servers running the same N+1 query pattern just hit the database from more connections simultaneously.

Real scaling work is about understanding limits — where the system runs out of a resource — and eliminating that limit before the next one becomes the bottleneck.

The Resources That Run Out First

Different system components hit different ceilings:

Database connections run out before almost anything else in typical web applications. Each connection consumes memory on both the application and database sides. A PostgreSQL instance with 200 max connections supports roughly 200 concurrent database operations, regardless of how many application servers are running. Scaling from 5 to 50 application servers while each holds 10 idle connections means 500 connections to a 200-connection database — and connection errors at load.

Connection pooling (PgBouncer in transaction mode for PostgreSQL, the HikariCP pool in JVM applications) is the standard response. PgBouncer transaction-mode pooling allows hundreds of application instances to share a small number of database connections by multiplexing at the transaction boundary. At scale, this is not optional.

Thread pools are the mechanism by which blocking I/O limits concurrent requests. In a thread-per-request model, each request waiting for a database response, an HTTP call, or a file read holds a thread. Threads consume memory (~1MB stack by default on JVM). A service configured with 200 threads can handle 200 concurrent blocking operations. Beyond that, requests queue.

The escape from this constraint is reactive or async I/O (Spring WebFlux, Vert.x, Node.js event loop). These models don't dedicate a thread per request — they use non-blocking I/O to handle many concurrent operations on a small thread pool. The tradeoff is programming model complexity: debugging async stack traces is harder, and not all libraries support non-blocking operation.

Memory typically fails in two modes: steady-state growth (a memory leak), or per-request allocations that multiply with concurrent load. An endpoint that allocates 100MB to process a large payload scales to exactly (available memory) / 100 concurrent requests of that type before running out.

The Stateless Prerequisite

Horizontal scaling — adding more instances — requires stateless instances. Any state that lives in an instance (in-memory session data, local file uploads, in-process caches of per-user data) creates asymmetry between instances. Requests for a user on instance A behave differently than the same requests on instance B.

The practical path to stateless instances:

Sessions in Redis (or a database) rather than in process memory
File uploads to object storage (S3, GCS) rather than local disk
Configuration from environment variables or a config service rather than files that vary per instance
Per-request correlation IDs in a distributed tracing system rather than request state on the instance

This is not complex to implement and does not require a major refactor for most services. It does require doing it before you need to scale, not as an emergency response to a traffic spike.

What Actually Changes at Each Order of Magnitude

The architectural concerns shift as load increases:

~100 req/sec: A single well-tuned application instance, a single database. The bottleneck is usually application logic or database queries. Fix your N+1 queries, add the missing indexes, and you have headroom.

~1,000 req/sec: Horizontal application scaling. Database read replicas for read-heavy workloads. Caching for frequently-read, infrequently-changing data. The bottleneck shifts to the database write path.

~10,000 req/sec: Database sharding or partitioning starts becoming relevant. Connection pooling infrastructure (PgBouncer) is essential. CDN for static assets removes a category of load entirely. The bottleneck is often a specific hot database table or a chatty network path.

~100,000 req/sec: Specialized infrastructure at every layer. At this point, most teams have moved past generic advice — the bottlenecks are specific to your data model and access patterns. Twitter's famous move from MySQL to a tiered object store for tweet storage, Slack's migration from a shared Redis cluster to a partitioned one — these are solutions to specific, measured constraints.

The Measurement Imperative

Nobody told you this because it's not a memorable principle: scaling work is 80% measurement and 20% change. Before you touch anything, you need: current throughput and latency percentiles (p50, p95, p99), resource utilization per component (CPU, memory, connections, I/O), and a load test that reproduces production traffic patterns.

k6 and Gatling are both practical choices for load testing. Run them against a staging environment that mirrors production as closely as possible. Establish your baseline. Then make one change at a time and re-measure. This is slower than making five changes at once. It's the only way to know what worked.

The Practical Takeaway

Before your next scaling effort, answer: what is the specific resource that runs out first under load? You should be able to name it — database connections, thread pool slots, memory, a specific slow query. If you can't name it from measurement, you haven't done the measurement yet. Do that before writing a line of scaling code.

Our offices

Follow us

What Nobody Tells You About Scaling a Backend System

The Scaling Mythology

The Resources That Run Out First

The Stateless Prerequisite

What Actually Changes at Each Order of Magnitude

The Measurement Imperative

The Practical Takeaway

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

What Clients Wish Their Contractors Would Just Tell Them

What Integration Tests Should Actually Be Testing

Designing APIs That Last — Principles From 10 Years of Breaking Things

Why Some Companies Prefer Independent Contractors