What Nobody Tells You About Scaling a Backend System

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Scaling Mythology

Scaling has a mythology built around it in the industry. The myth is that it's primarily about infrastructure: more servers, bigger machines, better cloud architecture. The reality is that most systems fail to scale because of application-layer problems that no amount of horizontal scaling can fix. More servers running the same N+1 query pattern just hit the database from more connections simultaneously.

Real scaling work is about understanding limits — where the system runs out of a resource — and eliminating that limit before the next one becomes the bottleneck.

The Resources That Run Out First

Different system components hit different ceilings:

Database connections run out before almost anything else in typical web applications. Each connection consumes memory on both the application and database sides. A PostgreSQL instance with 200 max connections supports roughly 200 concurrent database operations, regardless of how many application servers are running. Scaling from 5 to 50 application servers while each holds 10 idle connections means 500 connections to a 200-connection database — and connection errors at load.

Connection pooling (PgBouncer in transaction mode for PostgreSQL, the HikariCP pool in JVM applications) is the standard response. PgBouncer transaction-mode pooling allows hundreds of application instances to share a small number of database connections by multiplexing at the transaction boundary. At scale, this is not optional.

Thread pools are the mechanism by which blocking I/O limits concurrent requests. In a thread-per-request model, each request waiting for a database response, an HTTP call, or a file read holds a thread. Threads consume memory (~1MB stack by default on JVM). A service configured with 200 threads can handle 200 concurrent blocking operations. Beyond that, requests queue.

The escape from this constraint is reactive or async I/O (Spring WebFlux, Vert.x, Node.js event loop). These models don't dedicate a thread per request — they use non-blocking I/O to handle many concurrent operations on a small thread pool. The tradeoff is programming model complexity: debugging async stack traces is harder, and not all libraries support non-blocking operation.

Memory typically fails in two modes: steady-state growth (a memory leak), or per-request allocations that multiply with concurrent load. An endpoint that allocates 100MB to process a large payload scales to exactly (available memory) / 100 concurrent requests of that type before running out.

The Stateless Prerequisite

Horizontal scaling — adding more instances — requires stateless instances. Any state that lives in an instance (in-memory session data, local file uploads, in-process caches of per-user data) creates asymmetry between instances. Requests for a user on instance A behave differently than the same requests on instance B.

The practical path to stateless instances:

  • Sessions in Redis (or a database) rather than in process memory
  • File uploads to object storage (S3, GCS) rather than local disk
  • Configuration from environment variables or a config service rather than files that vary per instance
  • Per-request correlation IDs in a distributed tracing system rather than request state on the instance

This is not complex to implement and does not require a major refactor for most services. It does require doing it before you need to scale, not as an emergency response to a traffic spike.

What Actually Changes at Each Order of Magnitude

The architectural concerns shift as load increases:

~100 req/sec: A single well-tuned application instance, a single database. The bottleneck is usually application logic or database queries. Fix your N+1 queries, add the missing indexes, and you have headroom.

~1,000 req/sec: Horizontal application scaling. Database read replicas for read-heavy workloads. Caching for frequently-read, infrequently-changing data. The bottleneck shifts to the database write path.

~10,000 req/sec: Database sharding or partitioning starts becoming relevant. Connection pooling infrastructure (PgBouncer) is essential. CDN for static assets removes a category of load entirely. The bottleneck is often a specific hot database table or a chatty network path.

~100,000 req/sec: Specialized infrastructure at every layer. At this point, most teams have moved past generic advice — the bottlenecks are specific to your data model and access patterns. Twitter's famous move from MySQL to a tiered object store for tweet storage, Slack's migration from a shared Redis cluster to a partitioned one — these are solutions to specific, measured constraints.

The Measurement Imperative

Nobody told you this because it's not a memorable principle: scaling work is 80% measurement and 20% change. Before you touch anything, you need: current throughput and latency percentiles (p50, p95, p99), resource utilization per component (CPU, memory, connections, I/O), and a load test that reproduces production traffic patterns.

k6 and Gatling are both practical choices for load testing. Run them against a staging environment that mirrors production as closely as possible. Establish your baseline. Then make one change at a time and re-measure. This is slower than making five changes at once. It's the only way to know what worked.

The Practical Takeaway

Before your next scaling effort, answer: what is the specific resource that runs out first under load? You should be able to name it — database connections, thread pool slots, memory, a specific slow query. If you can't name it from measurement, you haven't done the measurement yet. Do that before writing a line of scaling code.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

What Clients Wish Their Contractors Would Just Tell Them

The things clients most want to hear from contractors are usually the things contractors are most reluctant to say. Knowing the gap makes it easier to close.

Read more

What Integration Tests Should Actually Be Testing

Integration tests are expensive to write and run. Spending that cost on the wrong things — logic that belongs in unit tests, workflows that belong in end-to-end tests — wastes the investment. Here is how to focus integration test effort where it creates the most value.

Read more

Designing APIs That Last — Principles From 10 Years of Breaking Things

An API is a contract. Breaking it breaks your users. The design decisions that seem minor at launch — naming, error shapes, pagination, versioning — are the ones that cost the most to change later. Here is what holds up and what doesn't.

Read more

Why Some Companies Prefer Independent Contractors

Hiring full-time employees isn’t always the default choice anymore. More companies are turning to independent contractors—and not just to save money.

Read more