Performance Testing Is Not Something You Do Right Before Launch
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Pre-Launch Load Test That Found Everything Too Late
The launch is in ten days. Someone schedules a load test. The test runs. Response times under 100 concurrent users are acceptable, but at 500 concurrent users — which is about 20% of your expected peak — p99 latency climbs to 12 seconds and the error rate hits 15%.
The root cause: a database query in the critical path that does a full table scan. No index. The table has 400 rows in the test environment and 4 million rows in production. The query takes 2ms with 400 rows and 8 seconds with 4 million.
The fix requires adding an index, running a migration on a production database with 4 million rows, and verifying that the query plan changes. This is a two-hour fix in isolation. With ten days to launch, stakeholders involved, and risk aversion at its peak, it will take three days of coordination.
If the load test had run six weeks earlier — against a dataset that was representative of production scale — the fix would have been routine. There was no reason it could not have.
The Problems That Only Show Up Under Load
Performance problems fall into two categories: problems that were always present but only matter at scale, and problems that emerge from concurrency. Both categories need to be found before launch, not the week of it.
Always-present, scale-dependent problems:
- N+1 query patterns (1 query to fetch N records, then N queries to fetch related data)
- Missing indexes on large tables
- Unoptimized joins that scan disproportionate row counts
- Serialization overhead that is negligible for small payloads and significant for large ones
- In-memory operations that hold too much data for small datasets and OOM for large ones
Concurrency problems:
- Thread pool exhaustion under parallel requests
- Connection pool saturation (typically visible as queued connections and latency spikes)
- Lock contention in the database
- Race conditions that only manifest with concurrent writes
- Cache stampede when many requests miss a cold cache simultaneously
Neither category can be found by functional testing, code review, or running the application with a handful of manual requests.
When Performance Testing Should Happen
During development, not after. The appropriate time to run a load test on a new endpoint is when that endpoint is written, not when the application is ready to ship. An endpoint-level load test takes an hour to set up with k6, Gatling, or Locust, and it will find the N+1 query before it becomes a launch-blocking issue.
// k6: load test for a new endpoint during development
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // Ramp up to 50 users
{ duration: '3m', target: 50 }, // Hold at 50
{ duration: '1m', target: 200 }, // Ramp to 200
{ duration: '3m', target: 200 }, // Hold at 200
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'], // Fail if p99 > 1s
http_req_failed: ['rate<0.01'], // Fail if error rate > 1%
},
};
export default function () {
const res = http.get('http://localhost:3000/api/products?page=1&per_page=25');
check(res, { 'status was 200': (r) => r.status === 200 });
sleep(1);
}
Run this when you write the /api/products endpoint. Not when you are preparing to ship. The threshold failure will surface problems immediately, in a context where fixing them is cheap.
With representative data. A load test against a database with 500 rows is not a load test — it is a happy path verification at scale. Performance testing requires data volumes that approximate production. If your production database will have 5 million records in the first year, seed your test environment to at least that scale.
Continuously in CI, at reduced scale. A subset of performance tests can run in CI — not the full 500-concurrent-user stress test, but a baseline: does this endpoint respond in under 200ms for a single user with 1 million rows in the database? A regression at this level, caught in CI, is a one-line fix. Caught two weeks before launch, it is a crisis.
The pattern that works: performance benchmarks as part of the definition of done for new endpoints, combined with periodic full stress tests run monthly against production-scale data. Neither requires waiting until launch.