Stop Writing Unit Tests That Only Work When Nothing Goes Wrong
by Eric Hanson, Backend Developer at Clean Systems Consulting
Where Production Bugs Actually Live
Pull up your last ten production incidents. Not hypothetical failures — actual incidents that caused user impact, alerts at 2am, rollbacks, or support tickets. For most backend systems, the distribution will look roughly like this: less than 20% are bugs in the happy path logic. The rest are failures under unexpected input, dependency timeouts, resource limits, race conditions, or edge cases in data that production receives but your tests never modeled.
Yet most test suites are inverted from this distribution. The majority of tests cover the path where all inputs are valid, all dependencies respond correctly, and all assumptions hold. The failure paths — where the real incidents happen — have one test each, or none.
What a Happy-Path-Only Suite Looks Like
# These are the tests most suites have
def test_create_user_success():
result = create_user(email="alice@example.com", password="secure123")
assert result.id is not None
assert result.email == "alice@example.com"
def test_get_user_success():
user = create_user(email="bob@example.com", password="secure123")
result = get_user(user.id)
assert result.email == "bob@example.com"
def test_update_user_success():
user = create_user(email="charlie@example.com", password="secure123")
result = update_user(user.id, email="charlie2@example.com")
assert result.email == "charlie2@example.com"
These tests will pass as long as the database is up, the input is valid, and no constraints are violated. They tell you almost nothing about what happens when:
- The email is already taken
- The password does not meet requirements
- The user ID does not exist
- The database is unavailable
- The email field is null, empty, 500 characters long, or contains a null byte
- The update is called concurrently by two processes
The Error Paths That Matter Most
For any function that can fail in a meaningful way, these are the categories of tests you actually need:
Invalid input. What happens when the caller passes null, an empty string, a negative number, a value that is out of range? Does the function throw a meaningful exception? Return a sensible default? The behavior should be specified and tested.
Missing or unavailable resources. What happens when the database returns no rows for a lookup? When a cache miss falls through to a slow backend that is timing out? When a file the code expects to read does not exist?
Dependency failures. What happens when the downstream service returns a 500? A 503? A response that looks like 200 but contains malformed JSON? The error handling logic for dependencies is often the most important untested code in a system.
Boundary conditions. Off-by-one errors live at boundaries. Pagination that skips the last item, fee calculations that are wrong at exactly $10,000, date logic that fails on the last day of the month.
// The tests that catch real bugs
@Test
void createUser_withDuplicateEmail_throwsDuplicateEmailException() {
userService.createUser("alice@example.com", "password");
assertThrows(DuplicateEmailException.class, () ->
userService.createUser("alice@example.com", "differentpassword")
);
}
@Test
void createUser_withNullEmail_throwsValidationException() {
assertThrows(ValidationException.class, () ->
userService.createUser(null, "password")
);
}
@Test
void getUser_withNonExistentId_throwsNotFoundException() {
assertThrows(UserNotFoundException.class, () ->
userService.getUser(UUID.randomUUID())
);
}
@Test
void createUser_whenDatabaseUnavailable_throwsServiceUnavailableException() {
doThrow(new DataAccessException("connection refused"))
.when(userRepository).save(any());
assertThrows(ServiceUnavailableException.class, () ->
userService.createUser("test@example.com", "password")
);
}
The Ratio to Aim For
There is no hard rule, but a rough target: for every happy-path test, you should have at least two to three tests covering failure modes, edge cases, or boundary conditions. If your suite has 50 tests and 45 of them cover the sunny path, the ratio is wrong.
The easiest way to identify what is missing is to read each function you care about and ask: what inputs could a caller pass that would cause this to behave unexpectedly? What states could the world be in when this function runs that would cause it to fail? Each of those is a test case.
Another useful practice: read your function's error handling code. Every catch block, every null check, every if err != nil has corresponding behavior that should be specified. If there is no test that exercises a catch block, you do not know what that catch block actually does under fire.
Production does not send your code well-formed inputs and perfectly available dependencies. Your tests should not assume it does.