Your Unit Tests Are Testing the Wrong Thing

by Arif Ikhsanudin, Backend Developer

The Refactor That Broke 40 Tests

You rename a private method. Or you extract a helper function from a larger one. Or you replace a loop with a stream operation. The behavior of the public API is unchanged — inputs produce the same outputs, side effects are identical. But 40 unit tests fail.

You fix the tests. They pass again. Nothing in production changed. You just spent an afternoon maintaining tests that were never testing anything a user cares about.

This is the wrong thing test problem. The tests were coupled to implementation details — the specific methods called, the specific intermediate values produced, the specific order of operations — rather than to the observable behavior of the unit under test.

Implementation Testing vs. Behavior Testing

Implementation testing verifies how the code works. Behavior testing verifies what the code does.

The distinction sounds subtle but has large practical consequences. Implementation tests break whenever the code is refactored, even correctly. Behavior tests break only when the observable output changes — which is exactly when you want a test to break.

# Implementation test: verifies internal structure
def test_price_calculator_calls_tax_service():
    mock_tax = Mock()
    mock_tax.get_rate.return_value = 0.08
    calculator = PriceCalculator(tax_service=mock_tax)

    calculator.compute(100.0, "US")

    mock_tax.get_rate.assert_called_once_with("US")  # Testing the HOW

# Behavior test: verifies observable output
def test_price_calculator_applies_correct_tax():
    mock_tax = Mock()
    mock_tax.get_rate.return_value = 0.08
    calculator = PriceCalculator(tax_service=mock_tax)

    result = calculator.compute(100.0, "US")

    assert result.total == 108.0  # Testing the WHAT
    assert result.tax_amount == 8.0

Both tests use a mock. But the first test fails if you rename get_rate to fetch_rate or refactor the calculator to batch tax lookups. The second test survives any internal change that preserves the output — and fails the moment the output is wrong.

The Specific Patterns That Indicate Wrong-Thing Testing

Verifying method call sequences. If your test asserts that method A was called before method B, you are testing execution order, not outcome. Unless the ordering has an observable effect on the output, this is implementation coupling.

Asserting on private state. Tests that reach into an object's internals — via reflection, by making fields package-private "for testing," or by exposing getters that only exist for the test — are testing private implementation details. If those details change, the tests break.

One test per method rather than one test per behavior. Organizing tests around the structure of the production code rather than around the behaviors it provides is a structural sign that the tests are mirroring the implementation. One behavior might span multiple methods; multiple behaviors might live in one method.

Mocking collaborators you own and then asserting the mock was called. If you own both the class under test and its collaborator, mocking the collaborator and asserting on the interaction is often testing internal wiring. Testing the final outcome through real collaborators (or fakes you control) tests the actual behavior.

What to Test Instead

Test the contract of the unit: given this input, I expect this output or this side effect. The contract is what callers depend on. Refactoring internals should not change the contract.

For a function that parses a CSV and returns a list of records, the contract is: given this CSV string, return these records. The internal parsing logic — how strings are split, how edge cases are handled — is implementation detail. Test the inputs and outputs.

func TestParseCSV(t *testing.T) {
    tests := []struct {
        name     string
        input    string
        expected []Record
        wantErr  bool
    }{
        {
            name:     "standard input",
            input:    "alice,30\nbob,25",
            expected: []Record{{Name: "alice", Age: 30}, {Name: "bob", Age: 25}},
        },
        {
            name:     "empty input",
            input:    "",
            expected: []Record{},
        },
        {
            name:    "malformed age field",
            input:   "alice,notanumber",
            wantErr: true,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            result, err := ParseCSV(tt.input)
            if tt.wantErr {
                require.Error(t, err)
                return
            }
            require.NoError(t, err)
            assert.Equal(t, tt.expected, result)
        })
    }
}

This test survives any internal refactor of ParseCSV. The implementation can change entirely as long as the behavior is preserved.

The practical question to ask before writing any assertion: "Would I want this test to break if a developer refactors the internals without changing the behavior?" If the answer is no, rethink the assertion. The test should be an ally to the developer refactoring the code, not an obstacle.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

HTTP Response Caching in Spring Boot — Cache-Control Headers, ETags, and CDN Integration

Application-layer caching with @Cacheable keeps data out of the database. HTTP caching with Cache-Control and ETags keeps responses out of the application entirely. The two layers serve different purposes and work best together.

Read more

The Reality of Working With International Contractors

Hiring international contractors can feel like opening a global talent buffet. But the reality is often a mix of opportunity, miscommunication, and timezone chaos.

Read more

Git Stash Is More Useful Than You Are Giving It Credit For

Most developers use git stash as an emergency escape hatch when they need to switch branches. It is actually a flexible, named, stackable tool for managing work-in-progress that most teams underuse.

Read more

Why “Simple Features” Are Often Not Simple

“It’s just a small feature” is one of the most expensive sentences in software. What looks simple on the surface often hides layers of complexity underneath.

Read more