Cache Invalidation: The Problem That Makes Caching Hard

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Cache Works Until the Data Changes

Your product page loads in 80ms because the product data is cached in Redis with a 5-minute TTL. A product manager updates the price. The new price is in the database. For the next 5 minutes, every customer sees the old price. If the price went down, that is a customer trust issue. If the price went up and someone completes a purchase at the old price, that is a business problem.

This is not a bug in the cache. It is the intended behavior of TTL-based invalidation. The question is whether the TTL you chose is appropriate for the consistency requirement of the data you are caching.

Most caching implementations use TTL as the primary invalidation mechanism because it is simple. It is also imprecise — it guarantees that the cache will be refreshed eventually, not that it will be refreshed when the data changes.

The Invalidation Strategies and Their Trade-offs

TTL-only. Set an expiration time. The cache self-invalidates after the TTL expires. Simple to implement. Consistency window equals the TTL. Appropriate when staleness for the TTL duration is acceptable and invalidation coordination is not worth the complexity.

Write-through invalidation. On every write to the database, also delete or update the corresponding cache entry. The cache is fresh immediately after writes. Requires the write path to know which cache keys are affected.

The problem: if the write path and the cache invalidation do not happen atomically, there is a window between the database write and the cache invalidation where the cache holds stale data. If the application crashes between the write and the cache delete, the stale entry persists until TTL expiry.

def update_product_price(product_id, new_price):
    db.execute("UPDATE products SET price = %s WHERE id = %s", [new_price, product_id])
    # Race condition: if the process dies here, cache stays stale until TTL
    cache.delete(f"product:{product_id}")

Mitigation: set a short TTL as a backstop even when using write-through invalidation. A 60-second TTL backstop catches failures in invalidation without holding stale data for 5 minutes.

Event-driven invalidation. Database changes produce events (via CDC — Change Data Capture — tools like Debezium with Kafka, or PostgreSQL logical replication). A cache invalidation consumer listens for change events and invalidates affected cache keys. Decouples the write path from cache management. Handles invalidation for all consumers including non-application writers (migrations, batch jobs, admin tools).

This is the right architecture for systems where multiple writers exist and the write path cannot reliably trigger cache invalidation. The trade-off is operational complexity — you are running a change data capture pipeline and a consumer.

Cache-aside with conditional reads (ETag). Rather than a fixed TTL, the cache stores an ETag (version identifier) alongside the cached data. On cache miss, the client fetches from the database and updates the cache. On cache hit, the client can conditionally revalidate by sending the ETag to the database — if the data has not changed, the database returns a lightweight "not modified" response. This shifts consistency checking to the read path rather than managing invalidation on the write path.

The Cases That Break Every Strategy

Dependent cache entries. A user profile cache entry depends on data from three tables. A write to any of those tables should invalidate the cache entry. Tracking these dependencies across the application is difficult. Either you invalidate too aggressively (high cache miss rate) or you miss invalidations (stale data).

Batch writes. A nightly job updates prices for 50,000 products. Write-through invalidation would delete 50,000 cache keys during the batch. If those keys are popular, you have a thundering herd problem the next morning when they all miss on first access. Coordinated cache warming after batch writes — proactively populating the cache before traffic hits — addresses this.

Multi-region caches. A write in one region must invalidate caches in other regions. Cross-region cache invalidation introduces latency and complexity. Until the invalidation propagates, different regions serve different data.

The Practical Rule

Use TTL as your baseline. Choose the longest TTL your consistency requirement permits. Add write-through invalidation for data where staleness causes user-facing problems and the write path is centralized. Add CDC-based invalidation when you have multiple writers. Never rely solely on write-through invalidation without a TTL backstop. Test your invalidation under failure conditions — specifically, what happens when the cache delete fails after the database write succeeds.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Message Queues Change the Way You Think About System Design

Message queues are not just a tool for offloading background jobs. They change the fundamental coupling model of a system in ways that affect reliability, scalability, and how you reason about failure.

Read more

Clear Acceptance Criteria in Backend Development

Clear acceptance criteria define exactly when a backend deliverable is considered complete. By setting measurable standards for performance, testing, and reliability, both the client and developer can verify the result with objective benchmarks.

Read more

Your Pipeline Is Flaky and That Is a Bigger Problem Than You Think

Flaky pipelines don't just waste time — they erode the team's trust in automation, leading developers to ignore failures that matter. Fixing flakiness is a prerequisite for meaningful CI, not an optional cleanup task.

Read more

The Difference Between a Message Queue and an Event Stream

Message queues and event streams are often discussed interchangeably, but they have different semantics, different use cases, and different operational characteristics. Choosing the wrong one creates problems that are hard to fix.

Read more