The Difference Between a Test Suite That Gives Confidence and One That Just Passes

by Eric Hanson, Backend Developer at Clean Systems Consulting

Two Suites, One Green Build

Both test suites pass in under two minutes. Both have 85% coverage. Both live in the same CI pipeline that turns green before every merge. From the outside, they look identical.

The difference shows up in production. With the first suite, regressions ship regularly — things that worked three weeks ago stop working and nobody catches it until a customer reports it. With the second, the last production regression was seven months ago, and when it happened, it was caught within hours.

The distinction is not the number of tests, or the coverage percentage, or the test framework. It is whether the suite was designed to detect failures or to satisfy a process.

The Anatomy of a Test That Just Passes

Tests designed to pass rather than to catch failures share recognizable patterns.

Assertions that cannot fail. assertNotNull, assertTrue(list.size() > 0), expect(result).toBeDefined() — these assertions pass for any non-crashing output. They execute the code and establish that it did not throw, but they verify nothing about what it actually did.

Tests that mirror the implementation. When a test is essentially a re-statement of the production code — calling the same operations in the same order and asserting intermediate values that only make sense if you have read the implementation — it will pass whenever the implementation runs and will only fail if the implementation is completely broken. It does not catch subtle behavioral regressions.

Setup that is more complex than the assertion. When a test spends 40 lines constructing objects and mocking dependencies, and the assertion is a single assertEquals, the test is often testing the mocking infrastructure rather than the actual behavior. If the test passes when the mock returns wrong data, it is not testing anything real.

Tests that avoid the hard cases. Happy-path-only test suites pass constantly and are useless in production, because production is where the unhappy paths live.

What a Confidence-Building Suite Looks Like

The tests that catch real bugs share a different set of characteristics.

Assertions are specific and derived from requirements. Not "the result is not null" but "the result is exactly 94.50 for a 5% discount applied to 99.47, rounded to the nearest cent." The assertion encodes a specific expectation that would fail if the behavior changed.

Tests are written from the user's perspective, not the implementation's. The test describes what should happen given a particular input — not how the code achieves it. When the implementation changes, the test should not break unless the behavior changed.

// Test written from implementation perspective — brittle
it('should call calculateTax then applyDiscount then formatCurrency', () => {
  const calcSpy = jest.spyOn(service, 'calculateTax');
  const discountSpy = jest.spyOn(service, 'applyDiscount');
  const formatSpy = jest.spyOn(service, 'formatCurrency');

  processOrder(order);

  expect(calcSpy).toHaveBeenCalled();
  expect(discountSpy).toHaveBeenCalled();
  expect(formatSpy).toHaveBeenCalled();
});

// Test written from behavior perspective — resilient
it('should return correct total for a discounted taxable order', () => {
  const order = { subtotal: 100.00, taxRate: 0.08, discountRate: 0.10 };

  const result = processOrder(order);

  // 100 * 0.90 = 90.00 discounted, then * 1.08 tax = 97.20
  expect(result.total).toBe(97.20);
  expect(result.currency).toBe('USD');
});

The first test will break every time you refactor the internals, even if the behavior is correct. The second test will survive any refactor and will catch any behavioral regression.

The suite includes tests for known failure modes. If you have had a production incident, there should be a test for it. If a customer reported a bug, there should be a test that would have caught it. These tests encode institutional memory and ensure the same failures do not recur.

The suite is trusted enough that a red build stops work. This is the practical indicator. If developers routinely push through a failing test because "it's just that flaky test" or "it's not related to my change," the suite has lost trust. Restoring trust means fixing or deleting the flaky tests, not ignoring them.

The Practical Audit

Once a month, look at your last ten production incidents. For each one, ask: would the current test suite have caught this before it shipped? If the answer is usually no, you have a suite that passes but does not give confidence.

The fix is not to add more tests uniformly. It is to write tests specifically for the failure modes your system has actually exhibited. Start there. Those tests will do more for your confidence in the next release than any coverage percentage will.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Your System Does Not Need to Scale to a Million Users on Day One

Designing for a million users before you have a hundred is not ambition — it is a way of making your system harder to build, harder to operate, and less likely to survive long enough to need that scale.

Read more

OAuth2 and JWT in Spring Boot — Resource Server Configuration, Token Validation, and Claims Extraction

A Spring Boot service that protects resources with OAuth2 JWT tokens is a resource server. Configuring one correctly requires understanding token validation, claims extraction, scope-based authorization, and how to test without a live authorization server.

Read more

Why Architecture Decisions Matter More Than Frameworks

Why do some apps crash after a minor update while others scale effortlessly? Often, it’s not the fancy framework—they’re just tools. The real magic (or disaster) starts with architecture.

Read more

Service Locator vs Dependency Injection in Java — Understanding the Tradeoffs

Both patterns resolve dependencies, but they make opposite choices about who controls the lookup. The difference has concrete consequences for testability, transparency, and how errors surface.

Read more