Your Unit Tests Are Testing the Wrong Thing

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Refactor That Broke 40 Tests

You rename a private method. Or you extract a helper function from a larger one. Or you replace a loop with a stream operation. The behavior of the public API is unchanged — inputs produce the same outputs, side effects are identical. But 40 unit tests fail.

You fix the tests. They pass again. Nothing in production changed. You just spent an afternoon maintaining tests that were never testing anything a user cares about.

This is the wrong thing test problem. The tests were coupled to implementation details — the specific methods called, the specific intermediate values produced, the specific order of operations — rather than to the observable behavior of the unit under test.

Implementation Testing vs. Behavior Testing

Implementation testing verifies how the code works. Behavior testing verifies what the code does.

The distinction sounds subtle but has large practical consequences. Implementation tests break whenever the code is refactored, even correctly. Behavior tests break only when the observable output changes — which is exactly when you want a test to break.

# Implementation test: verifies internal structure
def test_price_calculator_calls_tax_service():
    mock_tax = Mock()
    mock_tax.get_rate.return_value = 0.08
    calculator = PriceCalculator(tax_service=mock_tax)

    calculator.compute(100.0, "US")

    mock_tax.get_rate.assert_called_once_with("US")  # Testing the HOW

# Behavior test: verifies observable output
def test_price_calculator_applies_correct_tax():
    mock_tax = Mock()
    mock_tax.get_rate.return_value = 0.08
    calculator = PriceCalculator(tax_service=mock_tax)

    result = calculator.compute(100.0, "US")

    assert result.total == 108.0  # Testing the WHAT
    assert result.tax_amount == 8.0

Both tests use a mock. But the first test fails if you rename get_rate to fetch_rate or refactor the calculator to batch tax lookups. The second test survives any internal change that preserves the output — and fails the moment the output is wrong.

The Specific Patterns That Indicate Wrong-Thing Testing

Verifying method call sequences. If your test asserts that method A was called before method B, you are testing execution order, not outcome. Unless the ordering has an observable effect on the output, this is implementation coupling.

Asserting on private state. Tests that reach into an object's internals — via reflection, by making fields package-private "for testing," or by exposing getters that only exist for the test — are testing private implementation details. If those details change, the tests break.

One test per method rather than one test per behavior. Organizing tests around the structure of the production code rather than around the behaviors it provides is a structural sign that the tests are mirroring the implementation. One behavior might span multiple methods; multiple behaviors might live in one method.

Mocking collaborators you own and then asserting the mock was called. If you own both the class under test and its collaborator, mocking the collaborator and asserting on the interaction is often testing internal wiring. Testing the final outcome through real collaborators (or fakes you control) tests the actual behavior.

What to Test Instead

Test the contract of the unit: given this input, I expect this output or this side effect. The contract is what callers depend on. Refactoring internals should not change the contract.

For a function that parses a CSV and returns a list of records, the contract is: given this CSV string, return these records. The internal parsing logic — how strings are split, how edge cases are handled — is implementation detail. Test the inputs and outputs.

func TestParseCSV(t *testing.T) {
    tests := []struct {
        name     string
        input    string
        expected []Record
        wantErr  bool
    }{
        {
            name:     "standard input",
            input:    "alice,30\nbob,25",
            expected: []Record{{Name: "alice", Age: 30}, {Name: "bob", Age: 25}},
        },
        {
            name:     "empty input",
            input:    "",
            expected: []Record{},
        },
        {
            name:    "malformed age field",
            input:   "alice,notanumber",
            wantErr: true,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            result, err := ParseCSV(tt.input)
            if tt.wantErr {
                require.Error(t, err)
                return
            }
            require.NoError(t, err)
            assert.Equal(t, tt.expected, result)
        })
    }
}

This test survives any internal refactor of ParseCSV. The implementation can change entirely as long as the behavior is preserved.

The practical question to ask before writing any assertion: "Would I want this test to break if a developer refactors the internals without changing the behavior?" If the answer is no, rethink the assertion. The test should be an ally to the developer refactoring the code, not an obstacle.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

How to Write Rails Migrations Without Causing Downtime

Most Rails migration patterns that work fine in development will lock tables in production. Here is the mental model and specific techniques for schema changes that deploy safely on live databases.

Read more

How to Handle a Failing Software Project Professionally

“Something feels off… but no one wants to say it yet.” That quiet moment is where professionalism actually begins.

Read more

Background Jobs vs Cron Jobs — Which One Belongs in Your Stack

Background job queues and cron jobs solve different scheduling problems — and the common mistake of using cron for everything eventually produces a fragile task scheduler that you will spend three months replacing.

Read more

Stop Designing APIs for Yourself. Design Them for the Person Calling Them.

APIs often reflect how the backend is built instead of how they are used. Shifting the perspective to the consumer leads to simpler integrations, fewer errors, and more durable systems.

Read more