What Nobody Tells You About Scaling a Backend System

by Arif Ikhsanudin, Backend Developer

The Scaling Mythology

Scaling has a mythology built around it in the industry. The myth is that it's primarily about infrastructure: more servers, bigger machines, better cloud architecture. The reality is that most systems fail to scale because of application-layer problems that no amount of horizontal scaling can fix. More servers running the same N+1 query pattern just hit the database from more connections simultaneously.

Real scaling work is about understanding limits — where the system runs out of a resource — and eliminating that limit before the next one becomes the bottleneck.

The Resources That Run Out First

Different system components hit different ceilings:

Database connections run out before almost anything else in typical web applications. Each connection consumes memory on both the application and database sides. A PostgreSQL instance with 200 max connections supports roughly 200 concurrent database operations, regardless of how many application servers are running. Scaling from 5 to 50 application servers while each holds 10 idle connections means 500 connections to a 200-connection database — and connection errors at load.

Connection pooling (PgBouncer in transaction mode for PostgreSQL, the HikariCP pool in JVM applications) is the standard response. PgBouncer transaction-mode pooling allows hundreds of application instances to share a small number of database connections by multiplexing at the transaction boundary. At scale, this is not optional.

Thread pools are the mechanism by which blocking I/O limits concurrent requests. In a thread-per-request model, each request waiting for a database response, an HTTP call, or a file read holds a thread. Threads consume memory (~1MB stack by default on JVM). A service configured with 200 threads can handle 200 concurrent blocking operations. Beyond that, requests queue.

The escape from this constraint is reactive or async I/O (Spring WebFlux, Vert.x, Node.js event loop). These models don't dedicate a thread per request — they use non-blocking I/O to handle many concurrent operations on a small thread pool. The tradeoff is programming model complexity: debugging async stack traces is harder, and not all libraries support non-blocking operation.

Memory typically fails in two modes: steady-state growth (a memory leak), or per-request allocations that multiply with concurrent load. An endpoint that allocates 100MB to process a large payload scales to exactly (available memory) / 100 concurrent requests of that type before running out.

The Stateless Prerequisite

Horizontal scaling — adding more instances — requires stateless instances. Any state that lives in an instance (in-memory session data, local file uploads, in-process caches of per-user data) creates asymmetry between instances. Requests for a user on instance A behave differently than the same requests on instance B.

The practical path to stateless instances:

  • Sessions in Redis (or a database) rather than in process memory
  • File uploads to object storage (S3, GCS) rather than local disk
  • Configuration from environment variables or a config service rather than files that vary per instance
  • Per-request correlation IDs in a distributed tracing system rather than request state on the instance

This is not complex to implement and does not require a major refactor for most services. It does require doing it before you need to scale, not as an emergency response to a traffic spike.

What Actually Changes at Each Order of Magnitude

The architectural concerns shift as load increases:

~100 req/sec: A single well-tuned application instance, a single database. The bottleneck is usually application logic or database queries. Fix your N+1 queries, add the missing indexes, and you have headroom.

~1,000 req/sec: Horizontal application scaling. Database read replicas for read-heavy workloads. Caching for frequently-read, infrequently-changing data. The bottleneck shifts to the database write path.

~10,000 req/sec: Database sharding or partitioning starts becoming relevant. Connection pooling infrastructure (PgBouncer) is essential. CDN for static assets removes a category of load entirely. The bottleneck is often a specific hot database table or a chatty network path.

~100,000 req/sec: Specialized infrastructure at every layer. At this point, most teams have moved past generic advice — the bottlenecks are specific to your data model and access patterns. Twitter's famous move from MySQL to a tiered object store for tweet storage, Slack's migration from a shared Redis cluster to a partitioned one — these are solutions to specific, measured constraints.

The Measurement Imperative

Nobody told you this because it's not a memorable principle: scaling work is 80% measurement and 20% change. Before you touch anything, you need: current throughput and latency percentiles (p50, p95, p99), resource utilization per component (CPU, memory, connections, I/O), and a load test that reproduces production traffic patterns.

k6 and Gatling are both practical choices for load testing. Run them against a staging environment that mirrors production as closely as possible. Establish your baseline. Then make one change at a time and re-measure. This is slower than making five changes at once. It's the only way to know what worked.

The Practical Takeaway

Before your next scaling effort, answer: what is the specific resource that runs out first under load? You should be able to name it — database connections, thread pool slots, memory, a specific slow query. If you can't name it from measurement, you haven't done the measurement yet. Do that before writing a line of scaling code.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Recovering From a Public Mistake (Like a Website Crash)

Seeing your website go down in front of everyone is a stomach-dropping moment. But a public mistake doesn’t have to be a career-ender—it can be a chance to show professionalism and resilience.

Read more

The Difference Between Continuous Integration and Continuous Delivery Most Teams Blur

CI and CD are not a single acronym — they are two distinct disciplines with different goals, failure modes, and organizational requirements. Conflating them explains why most pipelines deliver neither well.

Read more

Why Singapore Tech Startups Hire Async Backend Contractors From Across Southeast Asia

Your backend roadmap has six months of work on it. Your team has two engineers, and one of them just gave notice.

Read more

What If Frontend Engineers Were Treated Like Backend? Chaos Would Start on Day One

Imagine frontend engineers starting a sprint with no design, no specs—just vibes and pressure. Now add extra responsibilities that have nothing to do with UI. That’s where things break fast.

Read more