Cache Invalidation: The Problem That Makes Caching Hard
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Cache Works Until the Data Changes
Your product page loads in 80ms because the product data is cached in Redis with a 5-minute TTL. A product manager updates the price. The new price is in the database. For the next 5 minutes, every customer sees the old price. If the price went down, that is a customer trust issue. If the price went up and someone completes a purchase at the old price, that is a business problem.
This is not a bug in the cache. It is the intended behavior of TTL-based invalidation. The question is whether the TTL you chose is appropriate for the consistency requirement of the data you are caching.
Most caching implementations use TTL as the primary invalidation mechanism because it is simple. It is also imprecise — it guarantees that the cache will be refreshed eventually, not that it will be refreshed when the data changes.
The Invalidation Strategies and Their Trade-offs
TTL-only. Set an expiration time. The cache self-invalidates after the TTL expires. Simple to implement. Consistency window equals the TTL. Appropriate when staleness for the TTL duration is acceptable and invalidation coordination is not worth the complexity.
Write-through invalidation. On every write to the database, also delete or update the corresponding cache entry. The cache is fresh immediately after writes. Requires the write path to know which cache keys are affected.
The problem: if the write path and the cache invalidation do not happen atomically, there is a window between the database write and the cache invalidation where the cache holds stale data. If the application crashes between the write and the cache delete, the stale entry persists until TTL expiry.
def update_product_price(product_id, new_price):
db.execute("UPDATE products SET price = %s WHERE id = %s", [new_price, product_id])
# Race condition: if the process dies here, cache stays stale until TTL
cache.delete(f"product:{product_id}")
Mitigation: set a short TTL as a backstop even when using write-through invalidation. A 60-second TTL backstop catches failures in invalidation without holding stale data for 5 minutes.
Event-driven invalidation. Database changes produce events (via CDC — Change Data Capture — tools like Debezium with Kafka, or PostgreSQL logical replication). A cache invalidation consumer listens for change events and invalidates affected cache keys. Decouples the write path from cache management. Handles invalidation for all consumers including non-application writers (migrations, batch jobs, admin tools).
This is the right architecture for systems where multiple writers exist and the write path cannot reliably trigger cache invalidation. The trade-off is operational complexity — you are running a change data capture pipeline and a consumer.
Cache-aside with conditional reads (ETag). Rather than a fixed TTL, the cache stores an ETag (version identifier) alongside the cached data. On cache miss, the client fetches from the database and updates the cache. On cache hit, the client can conditionally revalidate by sending the ETag to the database — if the data has not changed, the database returns a lightweight "not modified" response. This shifts consistency checking to the read path rather than managing invalidation on the write path.
The Cases That Break Every Strategy
Dependent cache entries. A user profile cache entry depends on data from three tables. A write to any of those tables should invalidate the cache entry. Tracking these dependencies across the application is difficult. Either you invalidate too aggressively (high cache miss rate) or you miss invalidations (stale data).
Batch writes. A nightly job updates prices for 50,000 products. Write-through invalidation would delete 50,000 cache keys during the batch. If those keys are popular, you have a thundering herd problem the next morning when they all miss on first access. Coordinated cache warming after batch writes — proactively populating the cache before traffic hits — addresses this.
Multi-region caches. A write in one region must invalidate caches in other regions. Cross-region cache invalidation introduces latency and complexity. Until the invalidation propagates, different regions serve different data.
The Practical Rule
Use TTL as your baseline. Choose the longest TTL your consistency requirement permits. Add write-through invalidation for data where staleness causes user-facing problems and the write path is centralized. Add CDC-based invalidation when you have multiple writers. Never rely solely on write-through invalidation without a TTL backstop. Test your invalidation under failure conditions — specifically, what happens when the cache delete fails after the database write succeeds.