Load Balancing Is Not Just Distributing Traffic. Here Is What It Really Does.
by Eric Hanson, Backend Developer at Clean Systems Consulting
What Engineers Think Load Balancers Do
Ask most engineers what a load balancer does and you get: "It distributes traffic across multiple servers." That is accurate the way "a database stores data" is accurate — technically correct, operationally incomplete.
The mental model of a load balancer as a simple traffic splitter causes real problems. Teams configure round-robin across three instances and consider the problem solved. Then they get surprised when a backend instance dies and requests continue hitting it for 30 seconds. Or when sticky sessions cause one instance to handle 70% of the traffic. Or when TLS configuration at the load balancer does not match their security requirements. These are not edge cases — they are the operational reality of running a load balancer in production.
What Load Balancers Actually Do
Health checking and failure removal. A load balancer continuously checks backend health and removes failing instances from the pool. The critical configuration is the health check parameters: interval (how often to check), threshold (how many failures before removal), and timeout (how long to wait for a response). A health check with a 30-second interval and a threshold of three failures means a dead backend handles traffic for up to 90 seconds before removal. For a system handling 500 req/s, that is 45,000 failed requests during that window.
# ALB health check configuration
# Default settings fail slowly:
HealthCheckIntervalSeconds: 30 # checks every 30s
HealthyThresholdCount: 3 # needs 3 successful checks
UnhealthyThresholdCount: 3 # needs 3 failed checks to remove
# Worst case: 90 seconds of traffic to a dead backend
# Aggressive settings for faster failover:
HealthCheckIntervalSeconds: 10
UnhealthyThresholdCount: 2
# Worst case: 20 seconds -- much more acceptable
TLS termination. The load balancer handles the TLS handshake with the client and forwards decrypted traffic to backends over the internal network. This offloads CPU-intensive cryptographic operations from application servers. It also centralizes certificate management — you renew the certificate in one place rather than on every instance. The tradeoff: traffic between the load balancer and backends is unencrypted unless you configure end-to-end TLS (mutual TLS between load balancer and backends), which adds complexity.
Connection pooling and HTTP/2 multiplexing. Modern load balancers like nginx and AWS ALB maintain persistent connection pools to backends. A client makes a request; the load balancer may use an existing connection to the backend rather than opening a new one. For HTTP/2, the load balancer multiplexes multiple client streams onto fewer backend connections. This matters significantly for high-concurrency workloads where connection setup overhead is non-trivial.
Session affinity (sticky sessions). The load balancer can route all requests from the same client to the same backend instance, based on a cookie. This is required when backends hold session state locally. The problem: it creates uneven load distribution. A client that makes 10x more requests than average disproportionately loads one backend. It also complicates failover — if the pinned backend dies, that client's session is lost. The better solution is stateless backends with centralized session storage (Redis), making sticky sessions unnecessary.
Balancing Algorithms That Matter
Round robin: requests cycle through backends in order. Simple, even distribution when all requests have similar cost. Fails when backends have different capacities or when request cost varies significantly.
Least connections: new requests go to the backend with the fewest active connections. Better for variable-cost requests — a long-running query on one backend does not disproportionately load that backend because new short requests route elsewhere. AWS ALB's "least outstanding requests" algorithm is a variant of this.
Weighted: backends receive traffic proportional to assigned weights. Used during deployments — gradually shift traffic from old to new instances by adjusting weights rather than a hard cutover.
What This Changes About Your Design
If you design your system assuming the load balancer is a dumb traffic splitter, you will be surprised by health check lag, session distribution problems, and TLS configuration gaps. Design instead with the assumption that the load balancer is a configurable policy layer between clients and your application.
Configure health checks aggressively. Use least-connections for variable-cost workloads. Remove local session state from application servers. Configure appropriate connection draining (the time the load balancer gives in-flight requests to complete before removing an instance during a deployment). These are not advanced configurations — they are the baseline for a load balancer configuration that behaves correctly under real conditions.