What Actually Happens When You Put a Load Balancer in Front of Your App

February 24, 2026

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Architecture Diagram Lie

In architecture diagrams, the load balancer is a rectangle with arrows pointing at a cluster of identical boxes. Traffic comes in, gets distributed, problem solved. It looks mechanical and obvious. In practice, adding a load balancer introduces a set of behavioral changes to your application that you need to understand before they surprise you in production.

This is not a documentation exercise. These are specific, concrete behaviors that cause production incidents on teams that didn't think them through.

Session State Assumptions

The most common surprise: your application stores user session data in memory. User authenticates, session object lives in the application process. Works perfectly with one instance. Add a second instance behind a load balancer and round-robin routing, and the user's next request may land on the other instance. No session. User is logged out. This is not a load balancer bug. It's an application that was never designed for horizontal scaling.

The fix is usually one of:

Sticky sessions (session affinity in load balancer config): all requests from the same client route to the same instance. Solves the immediate problem, undermines even load distribution, and creates a single point of failure per user session.
Distributed session storage: move session state out of process into Redis or a database. Stateless instances. Any request can land anywhere. This is the correct architecture for horizontally scaled systems; it's also more complex.

The load balancer didn't break your application. It exposed a design assumption that was always there.

Health Checks and What "Healthy" Means

Load balancers remove unhealthy instances from rotation. They determine health by polling a health check endpoint you define. This sounds simple and has several non-obvious implications.

Health checks that only verify the process is running are nearly useless. An instance can be running and completely unable to serve traffic — its database connection pool exhausted, its downstream dependencies unreachable, its thread pool saturated. A health check at /health that returns 200 because the HTTP server is alive will keep a broken instance in rotation.

A meaningful health check verifies readiness, not just liveness:

@GetMapping("/health/ready")
public ResponseEntity<Map<String, String>> readiness() {
    Map<String, String> status = new LinkedHashMap<>();
    
    // Check DB connectivity with a lightweight query
    try {
        jdbcTemplate.queryForObject("SELECT 1", Integer.class);
        status.put("database", "ok");
    } catch (Exception e) {
        status.put("database", "error: " + e.getMessage());
        return ResponseEntity.status(503).body(status);
    }
    
    // Check connection pool availability
    if (dataSource.getConnection() == null) {
        status.put("pool", "exhausted");
        return ResponseEntity.status(503).body(status);
    }
    
    return ResponseEntity.ok(status);
}

The Kubernetes distinction between livenessProbe and readinessProbe exists for exactly this reason: an instance that is alive but not ready should be restarted or pulled from rotation, not treated identically.

Connection Draining and In-Flight Requests

When a load balancer removes an instance from rotation — for a deploy, a scale-down, or a health check failure — requests that are already in flight to that instance don't stop. If you kill the instance immediately, those requests fail. Users see errors.

Connection draining (called "deregistration delay" in AWS ALB, configurable in most load balancers) allows a grace period: the instance stops receiving new connections but finishes serving existing ones. The default is typically 30 seconds. Whether this is enough depends on your longest requests.

For a service where the 99th-percentile request duration is 200ms, 30 seconds is generous. For a service that processes batch jobs that can run for 10 minutes, you need either a much longer drain window or a different strategy for long-running requests.

Timeouts at Every Layer

A load balancer introduces a new timeout boundary. Most load balancers have their own idle connection timeout (AWS ALB defaults to 60 seconds) and a request timeout. If your backend service takes longer than the load balancer's timeout to respond, the load balancer closes the connection and returns an error to the client — regardless of whether your backend eventually produces a valid response.

This means your application's own timeout settings need to be shorter than the load balancer's timeouts, which need to be shorter than the client's timeouts. Timeout hierarchy:

Client timeout > Load balancer timeout > Application timeout > Downstream timeout

Violating this hierarchy produces symptoms that look like intermittent failures: requests succeed most of the time but fail for some users, with no apparent pattern. The pattern is response time — it's the requests that take longer than the most restrictive timeout in the chain.

The HTTP vs TCP Layer Choice

Layer 4 load balancers (TCP) route connections without inspecting HTTP. They're fast and simple. They also don't understand HTTP concepts like host headers, URLs, or response codes. You can't do path-based routing, you can't strip SSL, and you can't route based on request attributes.

Layer 7 load balancers (HTTP) — AWS ALB, nginx, HAProxy in HTTP mode — understand HTTP. They can route /api/* to one backend and /static/* to another, terminate TLS, inject headers, and make routing decisions based on request content. They're slower and more complex. They're also what most applications actually need.

The Practical Takeaway

Before adding a load balancer to your stack, audit your application for three things: where session state lives, whether your health check endpoint reflects actual readiness, and what your longest-running requests are. Those three answers determine the majority of load balancer configuration decisions you'll need to make — and the incidents you'll have if you get them wrong.

Our offices

Follow us

What Actually Happens When You Put a Load Balancer in Front of Your App

The Architecture Diagram Lie

Session State Assumptions

Health Checks and What "Healthy" Means

Connection Draining and In-Flight Requests

Timeouts at Every Layer

The HTTP vs TCP Layer Choice

The Practical Takeaway

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Your API Is a Product. The Developer Is Your Customer.

The Developer Who Cuts Corners to Look Fast

How Seoul Tech Startups Are Filling Senior Backend Gaps Without Competing With the Big Players

The Query That Works Fine Until It Doesn't