Caching Strategies Compared — In-Memory, Redis, and CDN: When to Use Each

by Arif Ikhsanudin, Backend Developer

The cache that made the problem worse

Your database is getting hammered. Someone adds Redis caching to the product listing endpoint. Response times drop. Two weeks later, product managers report that updated product descriptions are not appearing on the site. Customers are seeing stale prices. The cache TTL is 24 hours. No invalidation logic was implemented because "we'll add that later." This is the most common caching failure I see in mid-stage products: caching without an invalidation strategy.

In-process memory cache: fast, local, disposable

In-process caching (Ruby's Rails.cache with the :memory_store, Java's Caffeine, .NET's IMemoryCache) stores data in the application process's heap. There is no network hop. Access is nanosecond-scale — orders of magnitude faster than Redis.

The cost: cache is not shared between processes. If you run 10 Puma workers or 5 service pods, each has its own cache with potentially different state. A cache invalidation event must reach all processes or you get inconsistency. Cache size is bounded by the process memory budget. Cache dies with the process — restarts, deployments, and crashes clear it entirely.

// Caffeine — in-process cache with size and TTL bounds
Cache<String, ProductCatalog> catalogCache = Caffeine.newBuilder()
    .maximumSize(500)
    .expireAfterWrite(5, TimeUnit.MINUTES)
    .recordStats()   // expose hit rate via Micrometer
    .build();

public ProductCatalog getCatalog(String categoryId) {
    return catalogCache.get(categoryId, id -> {
        // Only executed on cache miss
        return productRepository.findCatalogByCategory(id);
    });
}

Use in-process caching for: reference data that is the same across all requests (country codes, feature flags, configuration), computation results that are expensive to derive but rarely change (permission graph evaluation, compiled templates), and data where stale reads across processes are acceptable.

Redis: shared cache with network cost

Redis gives you a shared cache visible to all application instances. Invalidation events propagate across the fleet. Data survives process restarts. Redis Cluster gives you horizontal scaling and data partitioning. Redis Sentinel gives you high availability with automatic failover.

The cost: network latency. A local Redis instance typically adds 0.5-2ms per operation. A Redis instance in the same data center region adds 1-5ms. Cross-region is worse. For high-frequency cache lookups, this adds up: an endpoint that makes 10 cache reads is adding 5-50ms of Redis latency on top of everything else.

# Redis with proper key structure and TTL strategy
import redis
import json

r = redis.Redis(host='redis-cluster', port=6379, decode_responses=True)

def get_user_permissions(user_id: str) -> dict:
    cache_key = f"permissions:v2:{user_id}"  # version prefix for easy invalidation

    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    permissions = db.query("SELECT * FROM permissions WHERE user_id = %s", user_id)
    result = build_permission_map(permissions)

    # TTL shorter than your longest acceptable stale window
    r.setex(cache_key, 300, json.dumps(result))  # 5 minute TTL
    return result

def invalidate_user_permissions(user_id: str):
    # Explicit invalidation on change — do not rely solely on TTL
    r.delete(f"permissions:v2:{user_id}")

The version prefix in the key (v2:) is a practical pattern for bulk invalidation: when you change the cache schema, increment the prefix version. All old keys become orphaned and expire naturally without needing to enumerate them.

Use Redis for: session state, distributed rate limiting, shared application-level caches (product data, user preferences), queues and pub/sub, and any cache that must be consistent across application instances.

CDN caching: the largest lever, the least control

CDN caching (CloudFront, Fastly, Cloudflare) serves responses from edge nodes physically close to users. For cacheable content, this eliminates server round-trip entirely — the edge node serves directly from its cache. The latency improvement is dramatic: a New York user hitting a Singapore origin server gets 180ms round-trip; hitting a New York CDN edge gets 5-10ms.

The cost: you are caching at the HTTP response level, not at the data level. Cache keys are URL-based. Invalidation requires a CDN API call (CloudFront invalidation takes 10-60 seconds to propagate globally and costs $0.005 per path after the first 1000 paths per month). Authenticated content generally cannot be CDN-cached without careful cache key configuration including the session token (which defeats the whole point).

# CloudFront cache behavior — set at the CDN configuration level
Cache-Control: public, max-age=3600, s-maxage=86400

# s-maxage controls CDN TTL independently of browser TTL
# max-age=3600 tells browsers to cache for 1 hour
# s-maxage=86400 tells CDN to cache for 24 hours

# For API responses that change frequently:
Cache-Control: public, max-age=0, s-maxage=60, stale-while-revalidate=30
# CDN serves stale for up to 30s while fetching fresh — keeps hit rate high

Use CDN caching for: public static assets (images, JS, CSS), public API endpoints that return the same response for all unauthenticated users (product listings, public pricing, documentation), and any endpoint where the same URL → same response contract holds.

Layered caching: how these fit together

The production pattern that performs best combines all three layers:

  1. CDN absorbs the bulk of unauthenticated, public traffic at the edge. Cache-Control headers drive behavior.
  2. Redis handles application-level caching for authenticated or user-specific data that would not survive CDN (user preferences, personalized feeds, permission maps).
  3. In-process (Caffeine/Rails.cache) stores reference data and computation results that are the same across requests within a single process — configuration, feature flags, compiled templates.

The mistake that creates incidents is treating TTL as your only invalidation strategy. Write-through invalidation — explicitly deleting or updating the cache key when the underlying data changes — is required for any cache that holds data where stale reads have business impact (prices, inventory levels, account status). TTL is your fallback, not your primary invalidation mechanism.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Scope Creep Is Not the Client's Fault. It Is a Communication Problem.

Scope creep does not happen because clients are difficult. It happens because the original scope was never clearly enough defined — and that is usually the contractor's responsibility.

Read more

The Real Cost of Hiring a Backend Developer in Amsterdam (And the Smarter Alternative)

You budgeted for a backend developer. You didn't budget for the three months of interviews, the signing bonus someone else offered first, and the onboarding period where nothing ships. That's the part of the cost nobody puts in the job req.

Read more

Auckland Backend Developers Cost NZ$130K and the Market Has Maybe 200 Senior Candidates — Here Is the Fix

You've talked to every recruiter in Auckland. They all send you the same five people. Three of them aren't looking.

Read more

How to Deprecate an API Endpoint Without Abandoning Your Users

Deprecating an API endpoint isn’t just a technical step—it’s a contract change. Done right, it gives clients time to adapt without disruption; done poorly, it breaks trust.

Read more