What Actually Happens When You Put a Load Balancer in Front of Your App

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Architecture Diagram Lie

In architecture diagrams, the load balancer is a rectangle with arrows pointing at a cluster of identical boxes. Traffic comes in, gets distributed, problem solved. It looks mechanical and obvious. In practice, adding a load balancer introduces a set of behavioral changes to your application that you need to understand before they surprise you in production.

This is not a documentation exercise. These are specific, concrete behaviors that cause production incidents on teams that didn't think them through.

Session State Assumptions

The most common surprise: your application stores user session data in memory. User authenticates, session object lives in the application process. Works perfectly with one instance. Add a second instance behind a load balancer and round-robin routing, and the user's next request may land on the other instance. No session. User is logged out. This is not a load balancer bug. It's an application that was never designed for horizontal scaling.

The fix is usually one of:

  • Sticky sessions (session affinity in load balancer config): all requests from the same client route to the same instance. Solves the immediate problem, undermines even load distribution, and creates a single point of failure per user session.
  • Distributed session storage: move session state out of process into Redis or a database. Stateless instances. Any request can land anywhere. This is the correct architecture for horizontally scaled systems; it's also more complex.

The load balancer didn't break your application. It exposed a design assumption that was always there.

Health Checks and What "Healthy" Means

Load balancers remove unhealthy instances from rotation. They determine health by polling a health check endpoint you define. This sounds simple and has several non-obvious implications.

Health checks that only verify the process is running are nearly useless. An instance can be running and completely unable to serve traffic — its database connection pool exhausted, its downstream dependencies unreachable, its thread pool saturated. A health check at /health that returns 200 because the HTTP server is alive will keep a broken instance in rotation.

A meaningful health check verifies readiness, not just liveness:

@GetMapping("/health/ready")
public ResponseEntity<Map<String, String>> readiness() {
    Map<String, String> status = new LinkedHashMap<>();
    
    // Check DB connectivity with a lightweight query
    try {
        jdbcTemplate.queryForObject("SELECT 1", Integer.class);
        status.put("database", "ok");
    } catch (Exception e) {
        status.put("database", "error: " + e.getMessage());
        return ResponseEntity.status(503).body(status);
    }
    
    // Check connection pool availability
    if (dataSource.getConnection() == null) {
        status.put("pool", "exhausted");
        return ResponseEntity.status(503).body(status);
    }
    
    return ResponseEntity.ok(status);
}

The Kubernetes distinction between livenessProbe and readinessProbe exists for exactly this reason: an instance that is alive but not ready should be restarted or pulled from rotation, not treated identically.

Connection Draining and In-Flight Requests

When a load balancer removes an instance from rotation — for a deploy, a scale-down, or a health check failure — requests that are already in flight to that instance don't stop. If you kill the instance immediately, those requests fail. Users see errors.

Connection draining (called "deregistration delay" in AWS ALB, configurable in most load balancers) allows a grace period: the instance stops receiving new connections but finishes serving existing ones. The default is typically 30 seconds. Whether this is enough depends on your longest requests.

For a service where the 99th-percentile request duration is 200ms, 30 seconds is generous. For a service that processes batch jobs that can run for 10 minutes, you need either a much longer drain window or a different strategy for long-running requests.

Timeouts at Every Layer

A load balancer introduces a new timeout boundary. Most load balancers have their own idle connection timeout (AWS ALB defaults to 60 seconds) and a request timeout. If your backend service takes longer than the load balancer's timeout to respond, the load balancer closes the connection and returns an error to the client — regardless of whether your backend eventually produces a valid response.

This means your application's own timeout settings need to be shorter than the load balancer's timeouts, which need to be shorter than the client's timeouts. Timeout hierarchy:

Client timeout > Load balancer timeout > Application timeout > Downstream timeout

Violating this hierarchy produces symptoms that look like intermittent failures: requests succeed most of the time but fail for some users, with no apparent pattern. The pattern is response time — it's the requests that take longer than the most restrictive timeout in the chain.

The HTTP vs TCP Layer Choice

Layer 4 load balancers (TCP) route connections without inspecting HTTP. They're fast and simple. They also don't understand HTTP concepts like host headers, URLs, or response codes. You can't do path-based routing, you can't strip SSL, and you can't route based on request attributes.

Layer 7 load balancers (HTTP) — AWS ALB, nginx, HAProxy in HTTP mode — understand HTTP. They can route /api/* to one backend and /static/* to another, terminate TLS, inject headers, and make routing decisions based on request content. They're slower and more complex. They're also what most applications actually need.

The Practical Takeaway

Before adding a load balancer to your stack, audit your application for three things: where session state lives, whether your health check endpoint reflects actual readiness, and what your longest-running requests are. Those three answers determine the majority of load balancer configuration decisions you'll need to make — and the incidents you'll have if you get them wrong.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Your API Is a Product. The Developer Is Your Customer.

Treating APIs as products forces better decisions around usability, stability, and evolution. Teams that adopt this mindset ship APIs that are easier to integrate, harder to misuse, and cheaper to maintain.

Read more

The Developer Who Cuts Corners to Look Fast

Speed looks impressive—until the shortcuts catch up with you. Cutting corners may make a developer look fast today, but it costs the team tomorrow.

Read more

How Seoul Tech Startups Are Filling Senior Backend Gaps Without Competing With the Big Players

Competing with Samsung and Kakao for backend engineers is a losing game for most startups. The ones shipping consistently have stopped playing it.

Read more

The Query That Works Fine Until It Doesn't

Some queries are correct at low volume and catastrophically wrong at scale — recognizing the structural patterns that make queries inherently fragile is what separates reactive firefighting from proactive engineering.

Read more