The Best Architecture Decision Is the One You Can Explain to Your Team
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Architecture That Only One Person Understood
A team inherited a microservices architecture that had been designed by a particularly talented engineer who had since left. The system worked well in production. But nobody on the current team could confidently trace how a user request flowed from entry point to database write. The service mesh configuration was complex and undocumented. Adding a new service required understanding a Kubernetes operator that nobody had read the source code of.
Every non-trivial change required escalating to the engineer who'd been around longest, who understood the most but was far from omniscient. New engineers spent weeks before they felt safe making changes. The architecture was technically excellent and organizationally disabling.
This is the failure mode of optimizing for technical elegance over team-level operability.
Explainability as a Design Constraint
Explainability means: can you walk a competent engineer through the architecture in under thirty minutes, and will they feel confident making changes afterward?
This is a concrete test, not a philosophical position. If the answer is no — if understanding the system requires reading a thesis-length document or accumulating months of context — the architecture has a problem that is independent of whether it's technically correct.
The constraint cuts across all architecture levels:
- A microservices topology where the service dependencies and data ownership are clear is explainable. One where services have circular dependencies and shared databases is not.
- A deployment pipeline where each step is a visible, ordered action is explainable. One that relies on implicit triggers and shared state between Kubernetes operators is not.
- A data flow where events go from service A to queue B to service C is explainable. One where events fan out through a complex topology of topics, filters, and routers requires a map.
The Common Failure Modes
Abstraction for its own sake: Frameworks, meta-frameworks, and generic engines that wrap the actual business logic so thoroughly that reading the code gives you no information about what the system does. The classic example: a rule engine that processes XML-configured business rules. Incredibly flexible. Completely opaque to anyone who doesn't know the rule engine's internals.
Over-reliance on implicit behavior: Framework magic that does things behind the scenes — Spring's bean lifecycle, Hibernate's proxy behavior, event sourcing frameworks with implicit projections. These are powerful tools. They're also significant context requirements for anyone new to the codebase.
Complexity smuggled in through dependencies: A service that depends on a customized version of an internal framework that has diverged significantly from its public documentation. Anyone troubleshooting will read the docs, find them wrong, and spend hours debugging framework internals.
Decision records that don't exist: When the architecture relies on decisions made by people who are no longer present, and those decisions weren't documented, every person who encounters a non-obvious design choice must either reverse-engineer the reasoning or guess. Guesses are frequently wrong.
What Makes an Architecture Explainable
Visible structure: The topology of the system — what services exist, what data they own, how they communicate — should be visible in a single diagram that can be kept current. Not a sprawling enterprise architecture map, but a service dependency graph that a new engineer can orient on.
Locally understandable components: Each service, module, or component should be understandable in isolation without requiring deep knowledge of how the entire system works. The interface a component exposes, and the interfaces it depends on, should be sufficient context to understand it.
Documented non-obvious decisions: An ADR (Architecture Decision Record) for any decision that someone encountering the system would reasonably wonder about. "Why is this a queue instead of a direct call?" "Why does this service own this data instead of that service?" If these questions exist, the answers should be findable.
Runnable locally: If an engineer cannot run the system or a representative subset of it locally, they cannot explore and understand it by doing. Local development setup is a significant part of architectural explainability.
The Tradeoff
Explainability sometimes costs technical elegance. The most composable, flexible, or performant solution may also be the hardest to explain. When this tradeoff arises, the right question is: for the team size and turnover rate we have, what is the ongoing cost of the explainability gap?
A five-person team that has worked together for three years may be able to maintain a complex architecture effectively. A twenty-person team with regular turnover cannot afford the same complexity without significant documentation and tooling investment.
The Practical Takeaway
The next time you're proposing a significant architectural decision, test it against this criterion: can you explain it to the least-experienced engineer on your team in twenty minutes, and will they feel equipped to work within it? If not, either simplify the design or plan for the documentation and education investment required to close the gap. Architecture that the team can't explain is architecture the team can't own.