Caching at the API Level: The Performance Win Most Backends Skip

by Eric Hanson, Backend Developer at Clean Systems Consulting

The request that hits the database every time

A product detail page loads. The browser (or mobile app, or partner system) makes a GET /products/42 request. Your server executes three database queries, assembles the response, and returns it. The product has not changed in three days.

HTTP has a caching model that could have served this response from a CDN edge node in 8ms instead of your origin server in 120ms — without any database involvement. Most APIs have this opportunity and do not take it.

Cache-Control: the primary mechanism

The Cache-Control response header tells every caching layer between your server and the client (browser cache, CDN, reverse proxy) how to handle the response:

Cache-Control: public, max-age=300, stale-while-revalidate=60

public — this response can be stored by shared caches (CDN, proxy), not just the browser. max-age=300 — the cached response is fresh for 300 seconds. Requests within this window are served from cache without hitting your origin. stale-while-revalidate=60 — after the max-age expires, serve the stale response for up to 60 more seconds while revalidating in the background. This eliminates the latency spike on cache miss for frequently accessed resources.

For user-specific data:

Cache-Control: private, max-age=60

private — only the user's browser cache can store this. CDNs will not cache it.

For data that must never be cached:

Cache-Control: no-store

no-cache is often misunderstood — it means "cache it but revalidate before use," not "do not cache." Use no-store if you truly want nothing cached.

ETags for conditional revalidation

When a cached response expires, the cache can ask the server: "Has this changed since I last fetched it?" This is the conditional request pattern using ETag:

Initial response

GET /products/42
→ 200 OK
ETag: "v3-7f8a9b"
Cache-Control: public, max-age=300
{product data}

After max-age expires, cache revalidates

GET /products/42
If-None-Match: "v3-7f8a9b"
→ 304 Not Modified
ETag: "v3-7f8a9b"
Cache-Control: public, max-age=300
(empty body — use cached copy)

A 304 response has no body. It is cheap to generate and transmits almost nothing. For resources that are frequently polled but rarely changed, this dramatically reduces bandwidth and origin load.

Generating ETags: hash the response content (MD5 or SHA256 of the serialized output), or use a version counter from your database. The hash approach is simpler; the version counter is cheaper if you have it available and want to avoid serializing just to compute a hash.

import hashlib
import json

def compute_etag(data: dict) -> str:
    content = json.dumps(data, sort_keys=True)
    return hashlib.md5(content.encode()).hexdigest()[:16]

What is safe to cache and for how long

Safe to cache with long TTL (hours):

  • Product catalog data that changes infrequently
  • Reference data (country lists, currency codes, taxonomy)
  • Public user profiles
  • Historical records (past orders, closed invoices)

Safe to cache with short TTL (seconds to minutes):

  • Pricing data (changes occasionally, but a 30-second stale read is usually acceptable)
  • Inventory counts (approximate values are often fine)
  • Dashboard aggregate metrics

Not safe to cache:

  • Real-time data (live auction prices, ride availability)
  • Authentication tokens
  • User-specific cart and checkout state
  • Write responses (POST, PUT, PATCH, DELETE should return Cache-Control: no-store)

Cache invalidation

The hard part. When the underlying data changes, stale caches need to be cleared or they will serve outdated responses.

CDN-level invalidation: Most CDNs (Cloudflare, CloudFront, Fastly) offer cache purge APIs. When product 42 is updated, issue a purge for /products/42. This is synchronous invalidation — reliable but requires your application to know the affected cache keys and call the purge API on every write.

Surrogate keys / cache tags: Cloudflare and Fastly support tagging responses with arbitrary keys:

Cache-Tag: product-42, category-electronics
Surrogate-Key: product-42 category-electronics

Then purge by tag: all responses tagged product-42 are invalidated with a single API call. This handles the fan-out problem — a single product update might affect the product detail page, search results, and category listing pages. Tag all of them with product-42 and invalidate once.

TTL-based tolerance: The simplest approach — accept that caches will be stale for up to TTL seconds. No invalidation logic needed. Works when brief staleness is acceptable to the business.

The CDN placement question

Your Cache-Control: public headers do nothing unless something is caching them. Options:

  • Cloudflare / Fastly / CloudFront as a CDN in front of your origin: global edge caching, integrated DDoS protection, straightforward setup.
  • Varnish as a reverse proxy in your own infrastructure: more control, lower cost at scale, more operational complexity.
  • Nginx with proxy_cache: simpler than Varnish, good for single-region setups.

Pick based on your operational capabilities and traffic geography. The CDN approach is faster to set up; self-hosted gives more control once you have the operational maturity.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

How Singapore Scaleups Are Cutting Backend Overhead the Smart Way

You raised your Series A. You tripled your engineering team. Somehow, your backend ships slower than it did when there were four of you.

Read more

How Async Communication Improves Developer Productivity

Interruptions are productivity killers. Async communication lets developers focus without constant context switching.

Read more

When One Developer Knows Everything About the System

It feels reassuring to have one person who understands everything. Until you realize that person has quietly become your biggest bottleneck.

Read more

Designing Thread-Safe Classes in Java — Confinement, Immutability, and Synchronization

Thread safety is not a property you add after the fact — it is a design decision made at the class level. Three strategies cover nearly every case: confinement, immutability, and synchronization. Here is how to reason about which applies and how to apply it correctly.

Read more