The Difference Between a Fast Test Suite and a Useful Test Suite
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Suite That Runs in 90 Seconds and Catches Nothing
A team spent three weeks optimizing their test suite. They removed slow integration tests, parallelized the remaining unit tests, mocked everything that touched the database. CI now runs in 90 seconds. They ship a bug that week — a regression in the payment flow that the deleted integration tests would have caught.
The optimization was technically successful. The suite is fast. It is also largely decorative.
This is the trap of treating test suite speed as the primary metric. Speed matters, but only in the context of the suite being worth waiting for. A 90-second suite that misses real bugs is less valuable than a 10-minute suite that doesn't.
What Makes a Test Useful
A test is useful if — and only if — it would have caught at least one real defect in the last 12 months. This sounds like a high bar, but it's the right one. Tests that have never caught a real defect are either testing behavior that doesn't fail (low-value), testing behavior so thoroughly covered by other tests that they're redundant (wasteful), or testing in a way that can't catch real failures (broken).
The evaluation is retrospective: go through your postmortems and production incidents from the last year. For each one, identify the test category that should have caught it. If it's a category you have, investigate why the tests didn't catch it. If it's a category you're missing, you have a gap.
Example gap analysis from 6 months of incidents:
Incident | Category that should have caught it | Existed?
-------------------------------|--------------------------------------|----------
Null reference on legacy field | Edge-case unit tests with null inputs | No
Payment timeout not handled | Integration test with delayed mock | No
DB column truncation silently | Data validation integration test | No
Race condition in cache | Concurrent access test | No
Wrong locale in date formatting | Locale-specific unit test | Yes (didn't run)
This table is uncomfortable. It should be — it shows which tests you're missing. Build those missing categories. Don't optimize the existing ones until the gaps are filled.
The Speed-Usefulness Spectrum
Different test types sit at different points on the speed-usefulness tradeoff:
Unit tests are fast and good at catching logic errors in isolation. They're not useful for catching integration failures, data edge cases that appear at the boundary between components, or configuration problems.
Integration tests are slower and catch integration failures that unit tests can't. They require real (or realistic) infrastructure — a database, a message broker, an HTTP service. They're the tests most commonly sacrificed for speed.
End-to-end tests are slow and flaky but catch failure modes that only appear when the full system is assembled. They're useful specifically because they test what the user actually experiences.
The right balance depends on your system's failure modes. A CRUD API with complex business logic needs more unit tests. A data pipeline that moves records between systems needs more integration tests. A web application where user-facing flows are the primary risk needs more end-to-end tests.
The commonly cited "testing pyramid" (many unit tests, some integration, few end-to-end) is a useful starting heuristic but not a rule. Some systems have inverted pyramids that serve them well.
Optimizing Without Losing Coverage
When you do optimize for speed, the constraint is: don't remove signal. The techniques that reduce speed without removing signal:
Parallelize, don't delete. Running 200 integration tests in parallel across 4 runners takes one-quarter the time. Deleting 150 of them takes less time but removes coverage.
Mock at the right boundary. Mock external services (payment gateways, SMS providers), not internal components. Mocking your own database to speed up tests eliminates the tests' ability to catch database-specific failures (type coercion, constraint violations, query performance).
Separate speed tiers explicitly. Don't randomly sample or skip tests — build explicit test categories. Tests in the @Fast category (no I/O, no containers) run on every commit. Tests in the @Integration category run on every PR. Tests in the @Slow category run post-merge. Every test lives in one category with known justification.
@Tag("fast") // Unit tests: in-memory, no I/O
@Tag("integration") // Integration tests: Testcontainers, WireMock
@Tag("slow") // E2E, load, chaos tests
// In build.gradle.kts:
tasks.test {
useJUnitPlatform {
includeTags("fast") // CI: critical path
}
}
tasks.register<Test>("integrationTest") {
useJUnitPlatform {
includeTags("integration")
}
}
The key property: every test has a category that determines when it runs, and every category has a purpose. No tests exist outside of that structure.
The Right Optimization Question
Before optimizing any test, ask: does removing or speeding up this test reduce the probability of detecting a real defect? If yes — don't optimize it, fix the underlying slowness (usually a slow Testcontainer startup, an unmocked network call, or excessive data setup). If no — optimize aggressively.
Fast and useful aren't mutually exclusive. They require different questions.