Caching Strategies Compared — In-Memory, Redis, and CDN: When to Use Each

February 24, 2026

by Eric Hanson, Backend Developer at Clean Systems Consulting

The cache that made the problem worse

Your database is getting hammered. Someone adds Redis caching to the product listing endpoint. Response times drop. Two weeks later, product managers report that updated product descriptions are not appearing on the site. Customers are seeing stale prices. The cache TTL is 24 hours. No invalidation logic was implemented because "we'll add that later." This is the most common caching failure I see in mid-stage products: caching without an invalidation strategy.

In-process memory cache: fast, local, disposable

In-process caching (Ruby's Rails.cache with the :memory_store, Java's Caffeine, .NET's IMemoryCache) stores data in the application process's heap. There is no network hop. Access is nanosecond-scale — orders of magnitude faster than Redis.

The cost: cache is not shared between processes. If you run 10 Puma workers or 5 service pods, each has its own cache with potentially different state. A cache invalidation event must reach all processes or you get inconsistency. Cache size is bounded by the process memory budget. Cache dies with the process — restarts, deployments, and crashes clear it entirely.

// Caffeine — in-process cache with size and TTL bounds
Cache<String, ProductCatalog> catalogCache = Caffeine.newBuilder()
    .maximumSize(500)
    .expireAfterWrite(5, TimeUnit.MINUTES)
    .recordStats()   // expose hit rate via Micrometer
    .build();

public ProductCatalog getCatalog(String categoryId) {
    return catalogCache.get(categoryId, id -> {
        // Only executed on cache miss
        return productRepository.findCatalogByCategory(id);
    });
}

Use in-process caching for: reference data that is the same across all requests (country codes, feature flags, configuration), computation results that are expensive to derive but rarely change (permission graph evaluation, compiled templates), and data where stale reads across processes are acceptable.

Redis: shared cache with network cost

Redis gives you a shared cache visible to all application instances. Invalidation events propagate across the fleet. Data survives process restarts. Redis Cluster gives you horizontal scaling and data partitioning. Redis Sentinel gives you high availability with automatic failover.

The cost: network latency. A local Redis instance typically adds 0.5-2ms per operation. A Redis instance in the same data center region adds 1-5ms. Cross-region is worse. For high-frequency cache lookups, this adds up: an endpoint that makes 10 cache reads is adding 5-50ms of Redis latency on top of everything else.

# Redis with proper key structure and TTL strategy
import redis
import json

r = redis.Redis(host='redis-cluster', port=6379, decode_responses=True)

def get_user_permissions(user_id: str) -> dict:
    cache_key = f"permissions:v2:{user_id}"  # version prefix for easy invalidation

    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    permissions = db.query("SELECT * FROM permissions WHERE user_id = %s", user_id)
    result = build_permission_map(permissions)

    # TTL shorter than your longest acceptable stale window
    r.setex(cache_key, 300, json.dumps(result))  # 5 minute TTL
    return result

def invalidate_user_permissions(user_id: str):
    # Explicit invalidation on change — do not rely solely on TTL
    r.delete(f"permissions:v2:{user_id}")

The version prefix in the key (v2:) is a practical pattern for bulk invalidation: when you change the cache schema, increment the prefix version. All old keys become orphaned and expire naturally without needing to enumerate them.

Use Redis for: session state, distributed rate limiting, shared application-level caches (product data, user preferences), queues and pub/sub, and any cache that must be consistent across application instances.

CDN caching: the largest lever, the least control

CDN caching (CloudFront, Fastly, Cloudflare) serves responses from edge nodes physically close to users. For cacheable content, this eliminates server round-trip entirely — the edge node serves directly from its cache. The latency improvement is dramatic: a New York user hitting a Singapore origin server gets 180ms round-trip; hitting a New York CDN edge gets 5-10ms.

The cost: you are caching at the HTTP response level, not at the data level. Cache keys are URL-based. Invalidation requires a CDN API call (CloudFront invalidation takes 10-60 seconds to propagate globally and costs $0.005 per path after the first 1000 paths per month). Authenticated content generally cannot be CDN-cached without careful cache key configuration including the session token (which defeats the whole point).

# CloudFront cache behavior — set at the CDN configuration level
Cache-Control: public, max-age=3600, s-maxage=86400

# s-maxage controls CDN TTL independently of browser TTL
# max-age=3600 tells browsers to cache for 1 hour
# s-maxage=86400 tells CDN to cache for 24 hours

# For API responses that change frequently:
Cache-Control: public, max-age=0, s-maxage=60, stale-while-revalidate=30
# CDN serves stale for up to 30s while fetching fresh — keeps hit rate high

Use CDN caching for: public static assets (images, JS, CSS), public API endpoints that return the same response for all unauthenticated users (product listings, public pricing, documentation), and any endpoint where the same URL → same response contract holds.

Layered caching: how these fit together

The production pattern that performs best combines all three layers:

CDN absorbs the bulk of unauthenticated, public traffic at the edge. Cache-Control headers drive behavior.
Redis handles application-level caching for authenticated or user-specific data that would not survive CDN (user preferences, personalized feeds, permission maps).
In-process (Caffeine/Rails.cache) stores reference data and computation results that are the same across requests within a single process — configuration, feature flags, compiled templates.

The mistake that creates incidents is treating TTL as your only invalidation strategy. Write-through invalidation — explicitly deleting or updating the cache key when the underlying data changes — is required for any cache that holds data where stale reads have business impact (prices, inventory levels, account status). TTL is your fallback, not your primary invalidation mechanism.

Our offices

Follow us

Caching Strategies Compared — In-Memory, Redis, and CDN: When to Use Each

The cache that made the problem worse

In-process memory cache: fast, local, disposable

Redis: shared cache with network cost

CDN caching: the largest lever, the least control

Layered caching: how these fit together

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Why Melbourne Startups Cannot Win on Local Backend Hiring Alone

Ruby Performance Tips I Learned the Hard Way on a Production System

If Your API Needs a Long Explanation It Is Probably Too Complex

Turning One Contract Into a Long Term Relationship