Where You Put Your Cache Matters More Than You Think
by Arif Ikhsanudin, Backend Developer
The Cache Is In the Wrong Place
A team adds Redis in front of their database, sees a 10% improvement in p95 latency, and considers the problem solved. Six months later, user-facing latency is still unacceptably high. The Redis cache has a 95% hit rate. The problem was never the database — it was the three synchronous external API calls in the request path, none of which are cached.
Cache placement is a function of what you are trying to protect and from what. Placing a cache at the wrong layer either leaves the actual bottleneck unaddressed or creates consistency problems without meaningful performance benefit.
The Cache Layers and What They Protect
Client-side / in-process cache. Data cached in the application process memory — a dictionary, a LRU cache, an in-process store like Caffeine (JVM) or functools.lru_cache (Python). Sub-millisecond access, no network hop. Appropriate for: reference data that changes rarely and can tolerate per-instance staleness (configuration values, feature flags, lookup tables).
Limitation: not shared across instances. Each instance has its own copy. A cache update must be propagated to all instances, usually via TTL expiration or a pub/sub invalidation event. Memory is bounded by the instance's heap.
Distributed cache (Redis, Memcached). A shared cache layer between application instances and the database. Millisecond-range latency. Appropriate for: session data, computed results shared across users, expensive query results with acceptable staleness.
Limitation: adds a network hop. Introduces an additional failure surface. Requires managing consistency with the origin.
Database query cache. Some databases (MySQL historically, PostgreSQL less so) have built-in query result caches. Rarely the right layer — invalidation is coarse-grained and the performance benefit is typically better achieved at the application layer.
CDN / edge cache. Caches responses at the network edge, close to the client. Zero application server load for cache hits. Appropriate for: public, non-user-specific content — API responses, static data, media. HTTP cache headers (Cache-Control, ETag, Last-Modified) control CDN behavior.
# HTTP cache headers for a public API response:
Cache-Control: public, max-age=300, stale-while-revalidate=60
# max-age=300: serve from CDN for up to 5 minutes
# stale-while-revalidate=60: serve stale content for 60s while fetching fresh
# Result: near-zero origin load for popular public endpoints
Matching the Layer to the Problem
The question to ask first: where is the latency coming from? Use an APM or request profiler to identify the slowest segments.
If latency is in database queries: a distributed cache (Redis) in front of those queries is appropriate. Configure TTL based on acceptable staleness. Ensure cache keys are scoped to the query parameters.
If latency is in external API calls: cache the API responses in-process or in Redis with a TTL appropriate to the external data's change frequency. If the external API is called per-user, scope the cache key to the user.
If latency is in repeated computation (rendering, aggregation, encoding): cache the computed result. Ensure the cache key captures all inputs that affect the output.
If latency is geographic — users far from the origin experience high round-trip time: edge caching via CDN, or deploying application instances in multiple regions.
The Dangerous Placement
The most dangerous misplacement is caching user-specific data at a shared cache without proper key scoping. If a cache key does not include the user ID, two users can receive each other's data. This is not a theoretical concern — it has caused significant security incidents at real companies.
# Dangerous: cache key based only on request path
cache_key = f"product:{product_id}"
# If product data is user-specific (personalized pricing, entitlements),
# user A can receive user B's data.
# Safe: scope the cache key to the user when data is user-specific
cache_key = f"product:{product_id}:user:{user_id}"
Get the placement right first, then tune TTLs and invalidation. A cache in the wrong place cannot be fixed by tuning.