CAP Theorem Is Not Just Interview Knowledge. It Affects Real Decisions.
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Theorem You Memorized and Then Ignored
Eric Brewer formalized it in 2000 and it became a staple of technical interview prep: a distributed system can provide at most two of three properties — Consistency, Availability, and Partition Tolerance. You learn the three letters, you know that network partitions are inevitable so you're really choosing between C and A, and you move on.
Then you're choosing between DynamoDB and PostgreSQL for a new service and the theorem is nowhere in your reasoning. You're thinking about managed vs self-hosted, cost, team familiarity. These are legitimate concerns. But the fundamental data guarantee question — what happens to this system when a network partition occurs — is what CAP is actually about, and it should be in the room.
What Partition Tolerance Actually Means
A network partition is when nodes in a distributed system can't communicate with each other. This is not a theoretical failure — it happens. Network switches fail. Availability zones lose connectivity. Cloud providers have regional events. The question is not "will partitions happen" but "how often, and for how long."
Partition Tolerance means the system continues to operate during a partition. Because partitions happen, any production distributed system must be partition tolerant. So the real choice is: during a partition, does the system sacrifice Consistency (return potentially stale data) or Availability (refuse to respond until the partition heals)?
This is not a philosophical question. It maps directly to database behavior under failure.
CP Systems: Consistency Under Partition
A CP system refuses to serve requests that it cannot guarantee are consistent. During a network partition, a minority-side node (one that cannot reach a quorum) stops accepting reads or writes rather than risk returning stale or conflicting data.
HBase, Zookeeper, and systems using the Raft or Paxos consensus algorithm are CP. Google Spanner and CockroachDB are designed for strong consistency.
When to choose CP: financial transactions, inventory management, any domain where stale reads lead to incorrect business decisions. The cost is potential unavailability during partition events. If the minority partition is 5% of your nodes and the event lasts 30 seconds, that's 30 seconds of errors for the affected traffic. You decide whether that's acceptable.
AP Systems: Availability Under Partition
An AP system continues responding during a partition, potentially returning stale data. After the partition heals, nodes reconcile and converge. Conflicting writes are resolved by a defined merge strategy (last-write-wins, vector clocks, application-defined conflict resolution).
DynamoDB (with eventual consistency reads), Cassandra, CouchDB, and most CDN edge caches are AP systems.
When to choose AP: user profiles, content feeds, product catalogs, anything where slightly stale data is acceptable and the cost of unavailability is higher than the cost of occasional inconsistency. Amazon's shopping cart (the subject of Werner Vogels' original Dynamo paper) was intentionally AP — it's better to let a user add items to a potentially stale cart than to refuse to show them a cart at all.
The Decision Is Per-Data-Type, Not Per-Service
This is the nuance that the interview-prep version of CAP misses: you don't choose CP or AP for a service. You choose it for a piece of data, based on the consistency requirements for that specific data.
A single e-commerce service might reasonably use:
- PostgreSQL with serializable transactions for order creation and inventory decrements (CP behavior matters, stale reads cause overselling)
- DynamoDB with eventual consistency for user browsing history and recommendations (AP is fine, stale data has no business impact)
- Redis with replication for session data (eventual consistency is acceptable; session data goes stale after minutes anyway)
Treating "what database do we use for this service" as a single decision loses the distinction. The right question is "what consistency guarantees does this specific data require when the network has a bad day?"
The PACELC Extension
The CAP Theorem's limitation is that it only describes behavior during partition — a failure condition. Most of the time, there's no partition, and the relevant question is: what is the latency vs consistency tradeoff during normal operation?
PACELC (Daniel Abadi, 2012) extends CAP to address this: during a Partition (P), choose Availability (A) or Consistency (C); Else (E), during normal operation, choose Latency (L) or Consistency (C).
DynamoDB is PA/EL: available under partition, low latency under normal operation (at the cost of eventual consistency). CockroachDB is PC/EC: consistent under partition, consistent under normal operation (at the cost of higher latency from consensus rounds).
This framing is more useful for practical database selection because it acknowledges that partition events are rare and normal-operation tradeoffs dominate your daily experience.
The Practical Takeaway
The next time your team is selecting a data store, add one question to the evaluation: what does this system do when it can't reach a quorum of nodes? If the answer is "it refuses requests," you have a CP system. If the answer is "it serves potentially stale data," you have an AP system. Match that behavior to what your data's consistency requirements actually need — not to what sounds safest in the abstract.