Load Testing Your Backend Before It Hits Production Is Not Optional
by Eric Hanson, Backend Developer at Clean Systems Consulting
What You Are Actually Shipping Without Load Tests
A backend that has never been load tested has an unknown performance envelope. You do not know where it starts to degrade. You do not know which endpoints fail first under pressure. You do not know whether the database connection pool is sized correctly, whether the thread pool exhausts under modest concurrency, or whether there is a memory leak that only becomes visible after hours of sustained load.
Shipping without load testing does not mean you ship a fast backend. It means you ship a backend that may be fast, may be fine, or may fall over at 200 concurrent users — and you will find out which one on launch day.
The Minimum Viable Load Test
A load test does not need to be elaborate to be useful. The minimum you need before shipping a new backend service:
- Baseline response time — What is the p50, p95, p99 latency for your most critical endpoint under a single user?
- Concurrency target — At your expected peak concurrent users, do those latencies hold?
- Sustained load — Over 10 minutes at peak concurrency, does latency increase? (Memory leaks and connection exhaustion show up here.)
- Failure mode — What happens at 2x your peak? The system should degrade gracefully, not crash.
Gatling, k6, and Locust are the standard tools. k6 has the lowest barrier to entry for backend engineers already comfortable with JavaScript. Gatling's Scala DSL is powerful for complex scenarios. Locust is Python-native and well-suited for teams already in that ecosystem.
# Locust: minimum viable load test for a REST API
from locust import HttpUser, task, between
class APIUser(HttpUser):
wait_time = between(1, 3) # Simulate realistic think time
def on_start(self):
# Authenticate once per simulated user
response = self.client.post("/auth/token",
json={"username": "testuser", "password": "testpass"})
self.token = response.json()["access_token"]
@task(3) # 3x more likely than other tasks
def list_products(self):
self.client.get("/api/products",
headers={"Authorization": f"Bearer {self.token}"},
name="/api/products") # name groups URL-param variants
@task(1)
def get_product_detail(self):
self.client.get("/api/products/12345",
headers={"Authorization": f"Bearer {self.token}"},
name="/api/products/:id")
Run with locust --headless -u 200 -r 10 --run-time 10m --host http://staging.example.com — 200 users, ramping at 10/second, for 10 minutes.
Reading the Results
The metrics that matter:
p99 latency, not average. Average latency hides the long tail. A p99 of 3 seconds means 1 in 100 requests takes over 3 seconds — which at 1000 requests per second is 10 slow requests per second. Averages can look fine while p99 is unacceptable.
Error rate as load increases. A service that returns 0.1% errors at 50 users and 5% errors at 200 users has a concurrency-related failure mode. Find it before users do.
Saturation point. Load tests should identify the point where latency starts increasing non-linearly or error rate climbs. That is your service's current ceiling. If the ceiling is below your expected peak, you have a problem to address. If it is well above, you have margin.
Resource utilization at the saturation point. What was the CPU, memory, and database connection count when the service started degrading? This points you at the constraint. CPU-bound: look at compute-intensive code paths. Memory-bound: look for heap growth. Connection-bound: tune the pool size or look for connection leaks.
The Infrastructure That Makes This Routine
Load tests that require manual setup and a specialist to run will be skipped. Load tests that are automated and run on a schedule will be run.
The operational target: a lightweight load test that runs against a staging environment on every release candidate, with automated pass/fail against a latency threshold. This does not need to be a full stress test — a 5-minute run at expected peak concurrency, with a p99 threshold, is sufficient to catch regressions before they reach production.
The full stress test — finding the saturation point, characterizing the failure mode — runs monthly or before a significant traffic event (product launch, marketing campaign). This one requires manual review of results.
Neither requires a dedicated performance engineering team. It requires fifteen minutes of k6 or Locust scripting and a CI job that runs it.