Caching Strategies Compared — In-Memory, Redis, and CDN: When to Use Each

by Eric Hanson, Backend Developer at Clean Systems Consulting

The cache that made the problem worse

Your database is getting hammered. Someone adds Redis caching to the product listing endpoint. Response times drop. Two weeks later, product managers report that updated product descriptions are not appearing on the site. Customers are seeing stale prices. The cache TTL is 24 hours. No invalidation logic was implemented because "we'll add that later." This is the most common caching failure I see in mid-stage products: caching without an invalidation strategy.

In-process memory cache: fast, local, disposable

In-process caching (Ruby's Rails.cache with the :memory_store, Java's Caffeine, .NET's IMemoryCache) stores data in the application process's heap. There is no network hop. Access is nanosecond-scale — orders of magnitude faster than Redis.

The cost: cache is not shared between processes. If you run 10 Puma workers or 5 service pods, each has its own cache with potentially different state. A cache invalidation event must reach all processes or you get inconsistency. Cache size is bounded by the process memory budget. Cache dies with the process — restarts, deployments, and crashes clear it entirely.

// Caffeine — in-process cache with size and TTL bounds
Cache<String, ProductCatalog> catalogCache = Caffeine.newBuilder()
    .maximumSize(500)
    .expireAfterWrite(5, TimeUnit.MINUTES)
    .recordStats()   // expose hit rate via Micrometer
    .build();

public ProductCatalog getCatalog(String categoryId) {
    return catalogCache.get(categoryId, id -> {
        // Only executed on cache miss
        return productRepository.findCatalogByCategory(id);
    });
}

Use in-process caching for: reference data that is the same across all requests (country codes, feature flags, configuration), computation results that are expensive to derive but rarely change (permission graph evaluation, compiled templates), and data where stale reads across processes are acceptable.

Redis: shared cache with network cost

Redis gives you a shared cache visible to all application instances. Invalidation events propagate across the fleet. Data survives process restarts. Redis Cluster gives you horizontal scaling and data partitioning. Redis Sentinel gives you high availability with automatic failover.

The cost: network latency. A local Redis instance typically adds 0.5-2ms per operation. A Redis instance in the same data center region adds 1-5ms. Cross-region is worse. For high-frequency cache lookups, this adds up: an endpoint that makes 10 cache reads is adding 5-50ms of Redis latency on top of everything else.

# Redis with proper key structure and TTL strategy
import redis
import json

r = redis.Redis(host='redis-cluster', port=6379, decode_responses=True)

def get_user_permissions(user_id: str) -> dict:
    cache_key = f"permissions:v2:{user_id}"  # version prefix for easy invalidation

    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    permissions = db.query("SELECT * FROM permissions WHERE user_id = %s", user_id)
    result = build_permission_map(permissions)

    # TTL shorter than your longest acceptable stale window
    r.setex(cache_key, 300, json.dumps(result))  # 5 minute TTL
    return result

def invalidate_user_permissions(user_id: str):
    # Explicit invalidation on change — do not rely solely on TTL
    r.delete(f"permissions:v2:{user_id}")

The version prefix in the key (v2:) is a practical pattern for bulk invalidation: when you change the cache schema, increment the prefix version. All old keys become orphaned and expire naturally without needing to enumerate them.

Use Redis for: session state, distributed rate limiting, shared application-level caches (product data, user preferences), queues and pub/sub, and any cache that must be consistent across application instances.

CDN caching: the largest lever, the least control

CDN caching (CloudFront, Fastly, Cloudflare) serves responses from edge nodes physically close to users. For cacheable content, this eliminates server round-trip entirely — the edge node serves directly from its cache. The latency improvement is dramatic: a New York user hitting a Singapore origin server gets 180ms round-trip; hitting a New York CDN edge gets 5-10ms.

The cost: you are caching at the HTTP response level, not at the data level. Cache keys are URL-based. Invalidation requires a CDN API call (CloudFront invalidation takes 10-60 seconds to propagate globally and costs $0.005 per path after the first 1000 paths per month). Authenticated content generally cannot be CDN-cached without careful cache key configuration including the session token (which defeats the whole point).

# CloudFront cache behavior — set at the CDN configuration level
Cache-Control: public, max-age=3600, s-maxage=86400

# s-maxage controls CDN TTL independently of browser TTL
# max-age=3600 tells browsers to cache for 1 hour
# s-maxage=86400 tells CDN to cache for 24 hours

# For API responses that change frequently:
Cache-Control: public, max-age=0, s-maxage=60, stale-while-revalidate=30
# CDN serves stale for up to 30s while fetching fresh — keeps hit rate high

Use CDN caching for: public static assets (images, JS, CSS), public API endpoints that return the same response for all unauthenticated users (product listings, public pricing, documentation), and any endpoint where the same URL → same response contract holds.

Layered caching: how these fit together

The production pattern that performs best combines all three layers:

  1. CDN absorbs the bulk of unauthenticated, public traffic at the edge. Cache-Control headers drive behavior.
  2. Redis handles application-level caching for authenticated or user-specific data that would not survive CDN (user preferences, personalized feeds, permission maps).
  3. In-process (Caffeine/Rails.cache) stores reference data and computation results that are the same across requests within a single process — configuration, feature flags, compiled templates.

The mistake that creates incidents is treating TTL as your only invalidation strategy. Write-through invalidation — explicitly deleting or updating the cache key when the underlying data changes — is required for any cache that holds data where stale reads have business impact (prices, inventory levels, account status). TTL is your fallback, not your primary invalidation mechanism.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Melbourne Startups Cannot Win on Local Backend Hiring Alone

Melbourne has genuine tech depth and a startup scene worth taking seriously. Local backend hiring alone isn't enough to keep a growing product moving.

Read more

Ruby Performance Tips I Learned the Hard Way on a Production System

Most Ruby performance advice is synthetic benchmark folklore. These are patterns that caused measurable production problems — and the specific changes that fixed them.

Read more

If Your API Needs a Long Explanation It Is Probably Too Complex

An API that requires extensive documentation to use is an API whose complexity has been transferred to the consumer. Simplicity is a design goal, not a constraint.

Read more

Turning One Contract Into a Long Term Relationship

A single successful contract is valuable. A long-term relationship with the client who gave it to you is worth multiples of that — in income, in referrals, and in the kind of work you get to do.

Read more