What Integration Tests Should Actually Be Testing
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Expensive Category
Integration tests are the most expensive tests to write and maintain. They require infrastructure — databases, message brokers, external services — and they are slower to run than unit tests by an order of magnitude or more. The investment has to be justified by what the tests actually provide.
Many integration test suites waste this investment by testing things that unit tests cover better, or by not testing the things integration tests uniquely can. The result is a slow suite that does not add much value over a well-written unit suite — and a false sense of coverage.
What Integration Tests Uniquely Cover
The value of integration tests is in verifying behavior that requires real infrastructure — behavior that cannot be meaningfully tested with mocks. Specifically:
Database query correctness and performance. SQL queries that look logically correct can be semantically wrong. Queries that use the wrong join type, that produce wrong results with NULL values, that fail on edge cases in the data, or that use indexes incorrectly will only be caught by running against a real database with representative data.
-- This query looks fine but returns wrong results when customer has no orders
-- (NULL from the left join causes the sum to return NULL instead of 0)
SELECT c.id, SUM(o.total) as lifetime_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id
-- The integration test that catches this:
-- Insert a customer with no orders, assert lifetime_value == 0.00
-- Not 0.0, not NULL — the exact typed zero the application expects
ORM mapping fidelity. The gap between what an ORM generates and what the database actually stores is a constant source of bugs — precision loss in decimal types, timezone handling in timestamps, enum mapping failures, bidirectional relationship management. These require a real database to catch.
Message queue producer/consumer contracts. If your service publishes events to Kafka or RabbitMQ, integration tests should verify that the events are serialized correctly, that the consumer can deserialize them, and that schema evolution does not break existing consumers. This cannot be done with mocks of the broker.
HTTP client configuration and behavior. Retry logic, timeout handling, redirect following, TLS certificate validation, connection pooling — all of this requires a real (or mock) HTTP server. WireMock and MockServer are standard tools for standing up controllable HTTP servers in integration tests.
Transaction boundary behavior. Tests that verify that multiple operations within a transaction either all succeed or all roll back require a real database with real transaction support.
What Integration Tests Should Not Be Testing
Business logic that does not depend on infrastructure. If you are testing a discount calculation, a sorting algorithm, or a validation rule, use a unit test. Running these through an integration test just to exercise more code is not adding value — it is adding latency and flakiness for the same assertion.
Error handling that can be exercised with a mock. If you want to verify that your code returns a 500 when the database throws an exception, a mock that throws an exception tells you this just as well as a real database. Reserve the real database for tests where the specific behavior of the real database is what you are testing.
User-facing workflows end-to-end. A test that simulates a full user registration flow — form submission through the API through the database through the email service — is an end-to-end test. It belongs in a separate suite with different infrastructure (real SMTP or a mail catcher, full application stack). Mixing it into integration tests inflates the suite scope.
Structuring the Integration Suite
Organize integration tests by the boundary they are testing, not by the feature they relate to. A repository-layer test suite tests all database interactions. An HTTP client test suite tests all external service interactions. This structure makes it easy to see what boundaries are covered and to run only the relevant subset when changing a specific layer.
Use Testcontainers for databases rather than shared test databases. Shared test databases drift in schema, accumulate stale data, and create ordering dependencies between tests. A containerized PostgreSQL or MySQL that is provisioned fresh for each test run eliminates these issues entirely — and the startup overhead (typically 5–15 seconds for a PostgreSQL container) is amortized across the full test run.
The integration test suite is where you invest in confidence about the infrastructure layer. Spend it on the behavior that only the real infrastructure can verify.