Caching at the API Level: The Performance Win Most Backends Skip
by Eric Hanson, Backend Developer at Clean Systems Consulting
The request that hits the database every time
A product detail page loads. The browser (or mobile app, or partner system) makes a GET /products/42 request. Your server executes three database queries, assembles the response, and returns it. The product has not changed in three days.
HTTP has a caching model that could have served this response from a CDN edge node in 8ms instead of your origin server in 120ms — without any database involvement. Most APIs have this opportunity and do not take it.
Cache-Control: the primary mechanism
The Cache-Control response header tells every caching layer between your server and the client (browser cache, CDN, reverse proxy) how to handle the response:
Cache-Control: public, max-age=300, stale-while-revalidate=60
public — this response can be stored by shared caches (CDN, proxy), not just the browser.
max-age=300 — the cached response is fresh for 300 seconds. Requests within this window are served from cache without hitting your origin.
stale-while-revalidate=60 — after the max-age expires, serve the stale response for up to 60 more seconds while revalidating in the background. This eliminates the latency spike on cache miss for frequently accessed resources.
For user-specific data:
Cache-Control: private, max-age=60
private — only the user's browser cache can store this. CDNs will not cache it.
For data that must never be cached:
Cache-Control: no-store
no-cache is often misunderstood — it means "cache it but revalidate before use," not "do not cache." Use no-store if you truly want nothing cached.
ETags for conditional revalidation
When a cached response expires, the cache can ask the server: "Has this changed since I last fetched it?" This is the conditional request pattern using ETag:
Initial response
GET /products/42
→ 200 OK
ETag: "v3-7f8a9b"
Cache-Control: public, max-age=300
{product data}
After max-age expires, cache revalidates
GET /products/42
If-None-Match: "v3-7f8a9b"
→ 304 Not Modified
ETag: "v3-7f8a9b"
Cache-Control: public, max-age=300
(empty body — use cached copy)
A 304 response has no body. It is cheap to generate and transmits almost nothing. For resources that are frequently polled but rarely changed, this dramatically reduces bandwidth and origin load.
Generating ETags: hash the response content (MD5 or SHA256 of the serialized output), or use a version counter from your database. The hash approach is simpler; the version counter is cheaper if you have it available and want to avoid serializing just to compute a hash.
import hashlib
import json
def compute_etag(data: dict) -> str:
content = json.dumps(data, sort_keys=True)
return hashlib.md5(content.encode()).hexdigest()[:16]
What is safe to cache and for how long
Safe to cache with long TTL (hours):
- Product catalog data that changes infrequently
- Reference data (country lists, currency codes, taxonomy)
- Public user profiles
- Historical records (past orders, closed invoices)
Safe to cache with short TTL (seconds to minutes):
- Pricing data (changes occasionally, but a 30-second stale read is usually acceptable)
- Inventory counts (approximate values are often fine)
- Dashboard aggregate metrics
Not safe to cache:
- Real-time data (live auction prices, ride availability)
- Authentication tokens
- User-specific cart and checkout state
- Write responses (POST, PUT, PATCH, DELETE should return
Cache-Control: no-store)
Cache invalidation
The hard part. When the underlying data changes, stale caches need to be cleared or they will serve outdated responses.
CDN-level invalidation: Most CDNs (Cloudflare, CloudFront, Fastly) offer cache purge APIs. When product 42 is updated, issue a purge for /products/42. This is synchronous invalidation — reliable but requires your application to know the affected cache keys and call the purge API on every write.
Surrogate keys / cache tags: Cloudflare and Fastly support tagging responses with arbitrary keys:
Cache-Tag: product-42, category-electronics
Surrogate-Key: product-42 category-electronics
Then purge by tag: all responses tagged product-42 are invalidated with a single API call. This handles the fan-out problem — a single product update might affect the product detail page, search results, and category listing pages. Tag all of them with product-42 and invalidate once.
TTL-based tolerance: The simplest approach — accept that caches will be stale for up to TTL seconds. No invalidation logic needed. Works when brief staleness is acceptable to the business.
The CDN placement question
Your Cache-Control: public headers do nothing unless something is caching them. Options:
- Cloudflare / Fastly / CloudFront as a CDN in front of your origin: global edge caching, integrated DDoS protection, straightforward setup.
- Varnish as a reverse proxy in your own infrastructure: more control, lower cost at scale, more operational complexity.
- Nginx with
proxy_cache: simpler than Varnish, good for single-region setups.
Pick based on your operational capabilities and traffic geography. The CDN approach is faster to set up; self-hosted gives more control once you have the operational maturity.