Integration Tests Are Not Just Bigger Unit Tests
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Confusion That Makes Both Tests Worse
A common misunderstanding: integration tests are unit tests that test multiple units together, so they are just "bigger." By this logic, you write integration tests by taking a unit test and removing some of the mocks.
This framing produces a hybrid that does the job of neither. It is too slow to run constantly like a unit test, but it is still full of mocks, so it does not test real integration behavior. The real database behavior, the real network response handling, the real serialization edge cases — all still mocked away. Just with more production code executing before the mock boundary.
Unit tests and integration tests are answers to different questions. Conflating them produces tests that cannot answer either question well.
What Question Each Test Is Answering
A unit test answers: "Given these inputs, does this logic produce the correct output?" Everything outside the unit's responsibility is replaced with doubles (mocks, stubs, fakes). The test is fast because it has no I/O. It is precise because it is isolated.
An integration test answers: "Do these components work correctly together when connected to real infrastructure?" The database is real (or containerized). The serialization layer is exercised. The SQL queries run against an actual query planner. The HTTP client sends to a real server (or WireMock). The integration test catches the bugs that unit tests cannot: query planner behavior on production-like data, serialization round-trip issues, connection pool behavior, transaction isolation, and the specific failure modes of real dependencies.
# Unit test: is the discount calculation logic correct?
def test_discount_calculation():
# No database, no HTTP. Pure logic.
result = calculate_discounted_price(base_price=100.0, discount_rate=0.15)
assert result == 85.0
# Integration test: does the discount persist and round-trip correctly?
def test_discount_saved_and_retrieved_correctly(db_session):
# Real database (e.g., PostgreSQL in Docker via pytest-docker or Testcontainers)
product = Product(name="widget", base_price=100.0, discount_rate=0.15)
db_session.add(product)
db_session.commit()
retrieved = db_session.query(Product).filter_by(name="widget").first()
# Catches precision issues, decimal type mismatches, ORM mapping bugs
assert retrieved.base_price == Decimal("100.00")
assert retrieved.discount_rate == Decimal("0.15")
assert retrieved.calculate_discounted_price() == Decimal("85.00")
The unit test runs in microseconds. The integration test runs in seconds. They both belong in the suite, but they run at different times and catch different classes of bugs.
The Bugs That Only Integration Tests Catch
ORM and query planner surprises. An ORM mapping that looks correct in isolation may produce unexpected SQL. Lazy-loading that works in tests with small datasets causes N+1 queries under realistic data volumes. An index that you believe is being used is not, and the query is doing a full table scan.
Serialization and type coercion. JSON serialization rounds floating-point numbers differently across languages and libraries. A BigDecimal in Java becomes a float in the JSON response and loses precision by the time it reaches the client. A timestamp stored as UTC is returned as local time because the JDBC driver or ORM is doing an implicit conversion.
Transaction and isolation level behavior. Two operations that pass individually may produce incorrect results when run concurrently due to transaction isolation. A unit test with mocked repositories cannot catch this. A test that runs two concurrent requests against a real database can.
Connection pool behavior under load. Connection pool exhaustion, leaked connections that are not returned, and connection timeout handling are all invisible to unit tests and only manifest under realistic connection pressure.
How to Structure Integration Tests Effectively
Use a real but isolated database. Testcontainers (available for Java, Python, Go, and others) spins up a Docker container with the production database version for your tests and tears it down after. This is substantially more reliable than a shared test database and eliminates "works on my machine" problems caused by schema drift.
Reset state between tests, not between test cases. Rolling back the entire database schema after every single test case is slow. Use transactions that roll back after each test, or truncate tables between tests. Both approaches are much faster than rebuilding the schema.
Keep integration tests out of the unit suite. They run in a separate suite, triggered on pull requests and pre-deploy, not on every file save.
The goal is a unit suite fast enough to run on every change and an integration suite thorough enough to catch what unit tests cannot. Both are essential. But they are tools for different jobs, and treating them as the same tool degrades the performance of both.