The Difference Between a Fast Test Suite and a Useful Test Suite

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Suite That Runs in 90 Seconds and Catches Nothing

A team spent three weeks optimizing their test suite. They removed slow integration tests, parallelized the remaining unit tests, mocked everything that touched the database. CI now runs in 90 seconds. They ship a bug that week — a regression in the payment flow that the deleted integration tests would have caught.

The optimization was technically successful. The suite is fast. It is also largely decorative.

This is the trap of treating test suite speed as the primary metric. Speed matters, but only in the context of the suite being worth waiting for. A 90-second suite that misses real bugs is less valuable than a 10-minute suite that doesn't.

What Makes a Test Useful

A test is useful if — and only if — it would have caught at least one real defect in the last 12 months. This sounds like a high bar, but it's the right one. Tests that have never caught a real defect are either testing behavior that doesn't fail (low-value), testing behavior so thoroughly covered by other tests that they're redundant (wasteful), or testing in a way that can't catch real failures (broken).

The evaluation is retrospective: go through your postmortems and production incidents from the last year. For each one, identify the test category that should have caught it. If it's a category you have, investigate why the tests didn't catch it. If it's a category you're missing, you have a gap.

Example gap analysis from 6 months of incidents:

Incident                        | Category that should have caught it  | Existed?
-------------------------------|--------------------------------------|----------
Null reference on legacy field  | Edge-case unit tests with null inputs | No
Payment timeout not handled     | Integration test with delayed mock   | No
DB column truncation silently   | Data validation integration test     | No
Race condition in cache         | Concurrent access test               | No
Wrong locale in date formatting | Locale-specific unit test            | Yes (didn't run)

This table is uncomfortable. It should be — it shows which tests you're missing. Build those missing categories. Don't optimize the existing ones until the gaps are filled.

The Speed-Usefulness Spectrum

Different test types sit at different points on the speed-usefulness tradeoff:

Unit tests are fast and good at catching logic errors in isolation. They're not useful for catching integration failures, data edge cases that appear at the boundary between components, or configuration problems.

Integration tests are slower and catch integration failures that unit tests can't. They require real (or realistic) infrastructure — a database, a message broker, an HTTP service. They're the tests most commonly sacrificed for speed.

End-to-end tests are slow and flaky but catch failure modes that only appear when the full system is assembled. They're useful specifically because they test what the user actually experiences.

The right balance depends on your system's failure modes. A CRUD API with complex business logic needs more unit tests. A data pipeline that moves records between systems needs more integration tests. A web application where user-facing flows are the primary risk needs more end-to-end tests.

The commonly cited "testing pyramid" (many unit tests, some integration, few end-to-end) is a useful starting heuristic but not a rule. Some systems have inverted pyramids that serve them well.

Optimizing Without Losing Coverage

When you do optimize for speed, the constraint is: don't remove signal. The techniques that reduce speed without removing signal:

Parallelize, don't delete. Running 200 integration tests in parallel across 4 runners takes one-quarter the time. Deleting 150 of them takes less time but removes coverage.

Mock at the right boundary. Mock external services (payment gateways, SMS providers), not internal components. Mocking your own database to speed up tests eliminates the tests' ability to catch database-specific failures (type coercion, constraint violations, query performance).

Separate speed tiers explicitly. Don't randomly sample or skip tests — build explicit test categories. Tests in the @Fast category (no I/O, no containers) run on every commit. Tests in the @Integration category run on every PR. Tests in the @Slow category run post-merge. Every test lives in one category with known justification.

@Tag("fast")         // Unit tests: in-memory, no I/O
@Tag("integration")  // Integration tests: Testcontainers, WireMock
@Tag("slow")         // E2E, load, chaos tests

// In build.gradle.kts:
tasks.test {
    useJUnitPlatform {
        includeTags("fast")     // CI: critical path
    }
}

tasks.register<Test>("integrationTest") {
    useJUnitPlatform {
        includeTags("integration")
    }
}

The key property: every test has a category that determines when it runs, and every category has a purpose. No tests exist outside of that structure.

The Right Optimization Question

Before optimizing any test, ask: does removing or speeding up this test reduce the probability of detecting a real defect? If yes — don't optimize it, fix the underlying slowness (usually a slow Testcontainer startup, an unmocked network call, or excessive data setup). If no — optimize aggressively.

Fast and useful aren't mutually exclusive. They require different questions.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Testing Spring Boot Applications With Testcontainers — Real Databases, Real Brokers, Real Tests

H2 in-memory databases don't catch PostgreSQL-specific bugs. Mocked message brokers don't verify producer and consumer integration. Testcontainers runs real infrastructure in Docker during tests, eliminating the gap between what passes locally and what breaks in production.

Read more

The First Thing I Do When I Join a New Engineering Team

The first weeks at a new team are not the time to show what you know. They're the time to figure out where you actually are — and how to be useful as fast as possible.

Read more

Reactive Programming in Spring Boot — WebFlux, When to Use It, and When Not To

Spring WebFlux enables non-blocking, reactive HTTP handling. It solves a specific problem — high-concurrency I/O-bound services — and creates new problems for everything else. Here is what it actually does and the honest case for when it's worth adopting.

Read more

Idempotency: The API Property Most Backend Devs Forget Until It's Too Late

Non-idempotent APIs combined with retry logic are a production incident waiting to happen. Adding idempotency keys is not a nice-to-have for payment APIs — it is a correctness requirement for any operation that should not be executed twice.

Read more