Your Unit Tests Are Testing the Wrong Thing

April 3, 2026

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Refactor That Broke 40 Tests

You rename a private method. Or you extract a helper function from a larger one. Or you replace a loop with a stream operation. The behavior of the public API is unchanged — inputs produce the same outputs, side effects are identical. But 40 unit tests fail.

You fix the tests. They pass again. Nothing in production changed. You just spent an afternoon maintaining tests that were never testing anything a user cares about.

This is the wrong thing test problem. The tests were coupled to implementation details — the specific methods called, the specific intermediate values produced, the specific order of operations — rather than to the observable behavior of the unit under test.

Implementation Testing vs. Behavior Testing

Implementation testing verifies how the code works. Behavior testing verifies what the code does.

The distinction sounds subtle but has large practical consequences. Implementation tests break whenever the code is refactored, even correctly. Behavior tests break only when the observable output changes — which is exactly when you want a test to break.

# Implementation test: verifies internal structure
def test_price_calculator_calls_tax_service():
    mock_tax = Mock()
    mock_tax.get_rate.return_value = 0.08
    calculator = PriceCalculator(tax_service=mock_tax)

    calculator.compute(100.0, "US")

    mock_tax.get_rate.assert_called_once_with("US")  # Testing the HOW

# Behavior test: verifies observable output
def test_price_calculator_applies_correct_tax():
    mock_tax = Mock()
    mock_tax.get_rate.return_value = 0.08
    calculator = PriceCalculator(tax_service=mock_tax)

    result = calculator.compute(100.0, "US")

    assert result.total == 108.0  # Testing the WHAT
    assert result.tax_amount == 8.0

Both tests use a mock. But the first test fails if you rename get_rate to fetch_rate or refactor the calculator to batch tax lookups. The second test survives any internal change that preserves the output — and fails the moment the output is wrong.

The Specific Patterns That Indicate Wrong-Thing Testing

Verifying method call sequences. If your test asserts that method A was called before method B, you are testing execution order, not outcome. Unless the ordering has an observable effect on the output, this is implementation coupling.

Asserting on private state. Tests that reach into an object's internals — via reflection, by making fields package-private "for testing," or by exposing getters that only exist for the test — are testing private implementation details. If those details change, the tests break.

One test per method rather than one test per behavior. Organizing tests around the structure of the production code rather than around the behaviors it provides is a structural sign that the tests are mirroring the implementation. One behavior might span multiple methods; multiple behaviors might live in one method.

Mocking collaborators you own and then asserting the mock was called. If you own both the class under test and its collaborator, mocking the collaborator and asserting on the interaction is often testing internal wiring. Testing the final outcome through real collaborators (or fakes you control) tests the actual behavior.

What to Test Instead

Test the contract of the unit: given this input, I expect this output or this side effect. The contract is what callers depend on. Refactoring internals should not change the contract.

For a function that parses a CSV and returns a list of records, the contract is: given this CSV string, return these records. The internal parsing logic — how strings are split, how edge cases are handled — is implementation detail. Test the inputs and outputs.

func TestParseCSV(t *testing.T) {
    tests := []struct {
        name     string
        input    string
        expected []Record
        wantErr  bool
    }{
        {
            name:     "standard input",
            input:    "alice,30\nbob,25",
            expected: []Record{{Name: "alice", Age: 30}, {Name: "bob", Age: 25}},
        },
        {
            name:     "empty input",
            input:    "",
            expected: []Record{},
        },
        {
            name:    "malformed age field",
            input:   "alice,notanumber",
            wantErr: true,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            result, err := ParseCSV(tt.input)
            if tt.wantErr {
                require.Error(t, err)
                return
            }
            require.NoError(t, err)
            assert.Equal(t, tt.expected, result)
        })
    }
}

This test survives any internal refactor of ParseCSV. The implementation can change entirely as long as the behavior is preserved.

The practical question to ask before writing any assertion: "Would I want this test to break if a developer refactors the internals without changing the behavior?" If the answer is no, rethink the assertion. The test should be an ally to the developer refactoring the code, not an obstacle.

Our offices

Follow us

Your Unit Tests Are Testing the Wrong Thing

The Refactor That Broke 40 Tests

Implementation Testing vs. Behavior Testing

The Specific Patterns That Indicate Wrong-Thing Testing

What to Test Instead

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

How to Write Rails Migrations Without Causing Downtime

How to Handle a Failing Software Project Professionally

Background Jobs vs Cron Jobs — Which One Belongs in Your Stack

Stop Designing APIs for Yourself. Design Them for the Person Calling Them.