The Testing Pyramid Is Not a Rule. It Is a Guideline.
by Eric Hanson, Backend Developer at Clean Systems Consulting
Where the Pyramid Comes From
Mike Cohn introduced the testing pyramid in Succeeding with Agile in 2009. The idea was straightforward: unit tests are fast and cheap to run, so you should have a lot of them. End-to-end tests are slow and brittle, so you should have fewer. Integration tests sit in the middle.
This was useful guidance for a specific era of software, particularly CRUD applications with thick business logic, well-defined service boundaries, and relatively stable APIs. The pyramid works well there. The problem is that it gets taught as a universal law, applied to system architectures that look nothing like what Cohn had in mind.
When the Pyramid Does Not Fit
Consider a backend service that is primarily an integration layer — it receives events, transforms them slightly, and forwards them to three downstream APIs. The business logic is thin. The critical risk is whether the service correctly handles the various response states from those downstream APIs: rate limits, partial failures, malformed responses, retries.
A test pyramid applied literally here would produce hundreds of unit tests for the event transformation logic (which is simple and rarely breaks) and a handful of integration tests for the downstream interactions (which are where every production incident has occurred). That is the wrong shape for the risk profile.
Or consider a data pipeline service where the core complexity is in the SQL queries that aggregate and transform records. Unit tests cannot meaningfully test SQL correctness without an actual database. The test that matters runs the full query against a realistic dataset and checks the output. A pyramid-shaped suite would marginalize exactly the tests that provide real value.
Classic Pyramid: Better Shape for Integration-Heavy Services:
/\ ___________
/E2E\ / \
/------\ / Integration \
/ Integ \ /_______________\
/------------\ / \
/ Unit Tests \ / Unit Tests \
/__________________\ /___________________\
The right shape for your test suite follows the risk profile of your system, not the shape of a diagram from a 2009 book.
What the Pyramid Gets Right
The pyramid's underlying principle is still sound: favor tests that are fast and deterministic. Slow, flaky tests that require network calls and real databases are costly to maintain and erode trust in the suite. When a test fails, you want to know immediately whether it is a real failure or a timing issue with a shared test database.
So the real question is not "how many of each type?" but "what is the cheapest test that gives me confidence about this specific behavior?"
For pure computation — parsing logic, calculation functions, state machines — unit tests are the cheapest confident test. For database query correctness — integration tests against a real (or containerized) database are the cheapest. For user-facing flows in a frontend-heavy application — maybe a small number of end-to-end tests via Playwright or Cypress are actually cheaper in the long run than maintaining hundreds of component tests that mock the API layer.
Applying This to a Real System
Before deciding on the shape of your test suite, map out where your bugs actually come from. Pull up your incident history. Look at your PR comments. Ask your team where they spend time debugging.
If bugs cluster in business logic, lean into unit tests. If they cluster in service boundaries, invest in integration tests. If they cluster in user workflows, you need end-to-end coverage.
# For a service where the logic is simple but I/O is complex,
# the high-value test is at the integration boundary
def test_downstream_rate_limit_triggers_retry_with_backoff():
with responses.RequestsMock() as rsps:
# First call: 429 Too Many Requests
rsps.add(responses.POST, "https://api.downstream.com/events",
status=429, headers={"Retry-After": "2"})
# Second call: 200 OK
rsps.add(responses.POST, "https://api.downstream.com/events",
json={"status": "accepted"}, status=200)
result = forward_event({"type": "order.created", "id": "123"})
assert result.success is True
assert len(rsps.calls) == 2 # Confirm retry happened
This test is not unit or end-to-end — it is integration-level, and it is the most valuable test in this hypothetical service because it directly addresses the failure mode that matters most.
The pyramid is a starting heuristic. It tells you that before you write an end-to-end test, ask yourself whether a unit test would do the job. That question is worth asking every time. The answer is not always yes.
Write tests in the shape of your risks. Use the pyramid as a prompt for that conversation, not as a quota to fulfill.