Caching Is Not a Performance Fix. It Is a Performance Tool.
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Cache That Made Things Worse
An e-commerce platform was experiencing slow product pages. Someone added Redis caching for product data with a 24-hour TTL. Response times improved dramatically. Six months later, a support ticket: customers were seeing outdated prices and stock levels — sometimes for hours after a price change. The cache was correct. The business wasn't getting what it needed.
This is not a story about caching being bad. It's a story about caching being applied without fully thinking through the consistency implications. Caching always trades freshness for speed. Whether that tradeoff is acceptable depends entirely on what the data is and how it's used.
What Caching Actually Solves
Caching is the right tool for a specific set of problems:
Repeated reads of data that changes infrequently relative to how often it's read. Country codes, product categories, configuration values, user preferences — these are read thousands of times per second and change at most a few times per day. The cache hit rate is high. The staleness window is short relative to the change frequency.
Expensive computations that produce deterministic results for the same input. If generating a report takes 8 seconds and the underlying data changes hourly, caching the result for an hour means paying the 8-second cost once, not 3,600 times.
Results of third-party API calls where the external data changes slowly and the API has rate limits or latency you can't control.
What caching does not solve: slow queries caused by missing indexes, N+1 query patterns, connection pool exhaustion, or anything where the data changes at the same frequency it's read.
The Consistency Problem Is Not Optional
Every cache introduces a window of inconsistency between the cache and the source of truth. The size of that window is your TTL, plus however long it takes for invalidation to propagate. During that window, readers may see stale data. Whether this is acceptable is not a technical question — it is a product question.
For product page pricing, 24-hour staleness is probably not acceptable. For a dashboard showing "total users registered," staleness of five minutes is likely fine.
The discipline is to have this conversation before implementing the cache, not after the first customer complaint.
Cache Invalidation: The Hard Part
There are two hard problems in computer science: naming things, cache invalidation, and off-by-one errors. Cache invalidation is listed second but is arguably hardest in practice.
The basic strategies:
TTL-based expiry: Simple, predictable, always produces some staleness. Appropriate when you can tolerate the staleness window and when explicit invalidation is difficult.
Event-driven invalidation: Write operations publish events that delete or update cache entries. This approach can achieve near-zero staleness but requires tight coordination between write and cache paths. A missed invalidation event leaves stale data indefinitely.
// Write-through: update cache on every write
public void updateProduct(Product product) {
productRepository.save(product);
// Invalidate immediately; don't update — let the next read repopulate
cache.delete("product:" + product.getId());
}
Version-based keys: Include a version number or content hash in the cache key. Stale entries are never served — instead, they're orphaned and expire naturally. This is safe but requires managing key space growth.
product:42:v7 -> { price: 29.99, ... }
When product 42 is updated, the write creates key product:42:v8. Old key expires on TTL. Zero chance of serving stale data. Downside: the first read after an update always misses, and you need to clean up orphaned keys.
The Thundering Herd Problem
A cache that expires all entries at the same time (or where a highly-requested entry expires while traffic is high) causes a thundering herd: all concurrent requests for that data hit the database simultaneously. For a key that was being served from cache at 10,000 reads/second, the cache expiration becomes an instant load spike.
Mitigations: jitter (randomize TTLs within a range to spread expiration), probabilistic early recomputation (the approach described in the XFetch algorithm — refresh the cache probabilistically as the TTL approaches), or a distributed lock that allows only one thread to recompute while others wait for the updated value.
Local Cache vs. Distributed Cache
An in-process cache (Caffeine for JVM, functools.lru_cache for Python) is fast — reads are in nanoseconds, no network. It is also per-instance: in a horizontally scaled service, each instance has a different cache state. A write to instance A doesn't invalidate instance B's cache. This is acceptable for data that is effectively immutable or when per-instance staleness is tolerable. It is not acceptable for user-specific data that changes across requests.
Redis or Memcached provides a shared cache. Reads take ~1ms over the network. All instances share the same view. The consistency story is better; the latency is worse than local memory by a factor of 1,000.
The Practical Takeaway
Before implementing any cache: write down the TTL you're considering and then ask whether staleness of that duration is acceptable for the specific data being cached. If you can't answer that without involving a product decision-maker, involve one. Then decide whether event-driven invalidation is worth the implementation cost. If it is, implement it before the cache goes to production — retrofitting invalidation logic into an existing cache is significantly harder than building it in from the start.