Eventual Consistency: The Trade-off You Make When You Scale
by Eric Hanson, Backend Developer at Clean Systems Consulting
What "Eventually Consistent" Actually Means
Eventual consistency means: if no new updates are made to an item, all replicas will eventually converge to the same value. The key word is eventually. There is no specified bound on how long "eventually" takes. During the convergence window — which can be milliseconds or seconds under normal conditions, and much longer under network partitions — different replicas of the same data may return different values.
This is not a failure mode. It is the defined behavior of eventually consistent systems. DynamoDB with eventual consistency reads, Cassandra with consistency level ONE, MongoDB with secondary reads — these systems document this behavior. The question is whether your application code handles it correctly, or whether it assumes consistency it does not have.
The CAP Theorem in Practice
The CAP theorem (Consistency, Availability, Partition tolerance) states that a distributed system can guarantee at most two of the three properties simultaneously. Since network partitions are a reality in distributed systems, the practical choice is between consistency and availability during a partition.
Systems that choose consistency (CP systems) — Zookeeper, etcd, HBase — will reject reads and writes during a partition rather than return potentially stale data. Systems that choose availability (AP systems) — Cassandra, DynamoDB, CouchDB — continue serving reads and writes during a partition, accepting that different nodes may diverge and reconcile later.
The choice is not a product feature to be preferred based on marketing. It is a fundamental trade-off that determines how your system behaves under network failure.
The Bugs Eventual Consistency Creates
Read-your-writes violations. A user updates their profile picture. The write goes to the primary. The user immediately reloads the page. The read hits a replica with replication lag. The old profile picture is returned. The user thinks the update failed and submits it again.
Lost updates. Two concurrent writers read the same record (both see version 1), both modify it, both write (both write version 2). One write overwrites the other. The update that was overwritten is silently lost. This requires optimistic concurrency control — read the record version, write with a condition that the version has not changed since the read.
# Optimistic concurrency control to prevent lost updates:
def update_inventory(product_id, quantity_delta):
record = db.get(product_id)
current_version = record["version"]
new_quantity = record["quantity"] + quantity_delta
success = db.update(
key=product_id,
values={"quantity": new_quantity, "version": current_version + 1},
condition=f"version = {current_version}" # Conditional write
)
if not success:
# Another writer modified between our read and write -- retry
raise ConcurrentModificationException()
Stale aggregations. An analytics dashboard aggregates across replicas. During high write load, replicas lag. The dashboard shows totals that are lower than actuals. If the dashboard is used for business decisions — inventory counts, sales totals — stale data causes real problems.
When to Accept It and When Not To
Accept eventual consistency for: user-generated content (a post being visible a second late), analytics and reporting (slight staleness is fine), product catalog reads, non-financial counters.
Do not accept eventual consistency for: financial transactions (account balances, payment state), inventory operations where overselling is a business problem, access control decisions (whether a user is allowed to see data), any operation where a stale read leads to an incorrect write.
For the cases where consistency is required, route reads to the primary or use strongly consistent read options where available (DynamoDB strongly consistent reads, Cassandra CL QUORUM). Accept the latency cost. The alternative is correctness bugs that are hard to detect and hard to explain to users.
Eventual consistency is not something that "just works." It requires the application to be written with awareness of the consistency model. Write the code for the consistency level you actually have, not the one you wish you had.