HTTP Response Caching in Spring Boot — Cache-Control Headers, ETags, and CDN Integration

by Eric Hanson, Backend Developer at Clean Systems Consulting

The difference between application caching and HTTP caching

@Cacheable is application-layer caching — it prevents repeated database queries within the application. The response is still assembled, serialized, and sent over the network on every request.

HTTP caching is network-layer caching — it prevents repeated requests from reaching the application at all. A browser, CDN, or reverse proxy holds the cached response and serves it without contacting the origin server. Done correctly, HTTP caching can serve thousands of requests per second with zero application load.

The two layers are complementary. @Cacheable for a product query reduces database calls. Cache-Control: max-age=300 on the product endpoint prevents the same browser from requesting the product again within five minutes.

Cache-Control — the primary directive

Cache-Control is the HTTP/1.1 header that controls caching behavior. The directives that matter for API responses:

max-age=N — the response is fresh for N seconds. Clients (browsers, CDNs) can serve the cached response without contacting the origin for this duration.

no-cache — the response can be cached, but the client must revalidate with the origin before serving it. The origin can respond with 304 Not Modified (cached response is still valid) or a new response. This is not "don't cache" — it's "cache but always check freshness."

no-store — the response must not be stored. Used for sensitive data (authentication responses, financial data, user-specific data).

private — the response can be cached by the client (browser) but not by shared caches (CDNs, reverse proxies). For user-specific data that should be cached browser-side but not in a CDN that serves all users.

public — the response can be cached by any cache, including CDNs.

s-maxage=N — like max-age but applies only to shared caches (CDNs). Allows different TTLs for browser cache and CDN cache.

stale-while-revalidate=N — the client may serve a stale response for N seconds while fetching a fresh one in the background. Eliminates the latency spike when a cached response expires.

Setting these in Spring Boot:

@GetMapping("/products/{id}")
public ResponseEntity<ProductResponse> getProduct(@PathVariable Long id) {
    Product product = productService.findProduct(id);

    return ResponseEntity.ok()
        .cacheControl(CacheControl.maxAge(Duration.ofMinutes(10))
            .cachePublic()
            .staleWhileRevalidate(Duration.ofMinutes(2)))
        .body(ProductResponse.from(product));
}

This produces: Cache-Control: max-age=600, public, stale-while-revalidate=120

For user-specific data:

@GetMapping("/account/profile")
public ResponseEntity<ProfileResponse> getProfile(
        @AuthenticationPrincipal UserDetails user) {

    return ResponseEntity.ok()
        .cacheControl(CacheControl.maxAge(Duration.ofMinutes(5)).cachePrivate())
        .body(profileService.findProfile(user.getUsername()));
}

Cache-Control: max-age=300, private — browser caches for 5 minutes, CDN does not cache.

For sensitive data that must never be cached:

@GetMapping("/account/payment-methods")
public ResponseEntity<List<PaymentMethod>> getPaymentMethods(...) {
    return ResponseEntity.ok()
        .cacheControl(CacheControl.noStore())
        .body(paymentService.findPaymentMethods(userId));
}

ETags — conditional requests and validation

An ETag (entity tag) is a fingerprint of the response content. When a client has a cached response with an ETag, it sends If-None-Match: <etag> on subsequent requests. The server compares the ETag — if the content hasn't changed, it returns 304 Not Modified with no body. The client uses its cached copy.

304 Not Modified eliminates the response body over the network. For large responses, this is significant. The application still executes the handler method (to compute or fetch the current ETag), but the response body is not serialized or transmitted.

Spring's ShallowEtagHeaderFilter generates ETags automatically by hashing the response body:

@Bean
public FilterRegistrationBean<ShallowEtagHeaderFilter> shallowEtagHeaderFilter() {
    FilterRegistrationBean<ShallowEtagHeaderFilter> filterRegistrationBean =
        new FilterRegistrationBean<>(new ShallowEtagHeaderFilter());
    filterRegistrationBean.addUrlPatterns("/api/*");
    filterRegistrationBean.setName("etagFilter");
    return filterRegistrationBean;
}

ShallowEtagHeaderFilter intercepts the response, computes an MD5 hash of the body, adds it as an ETag header, and returns 304 if the client's If-None-Match matches the computed hash.

The limitation: ShallowEtagHeaderFilter still executes the full handler and serializes the response to compute the hash — it only avoids sending the body. The database query, application logic, and serialization all run. The saving is network bandwidth, not application load.

Deep ETags — computing the ETag from the resource state (version number, last-modified timestamp) without executing the full request:

@GetMapping("/products/{id}")
public ResponseEntity<ProductResponse> getProduct(
        @PathVariable Long id,
        @RequestHeader(value = "If-None-Match", required = false) String ifNoneMatch) {

    // Check ETag from lightweight metadata query — no full fetch
    String currentETag = productService.getETag(id);

    if (currentETag.equals(ifNoneMatch)) {
        return ResponseEntity.status(HttpStatus.NOT_MODIFIED)
            .eTag(currentETag)
            .build();
    }

    Product product = productService.findProduct(id);
    return ResponseEntity.ok()
        .eTag(currentETag)
        .cacheControl(CacheControl.maxAge(Duration.ofMinutes(10)))
        .body(ProductResponse.from(product));
}

productService.getETag(id) runs a lightweight query for just the version or hash — SELECT version FROM products WHERE id = ?. Only on a mismatch does the full product load execute. This reduces database load for frequently-checked resources.

The ETag value should be deterministic for the same content and change when content changes. A resource's version number, a hash of key fields, or a timestamp of last modification all work.

Last-Modified and conditional GET

Last-Modified / If-Modified-Since is the timestamp-based equivalent of ETag / If-None-Match. Less precise (second granularity) but simpler to implement:

@GetMapping("/products/{id}")
public ResponseEntity<ProductResponse> getProduct(
        @PathVariable Long id,
        WebRequest webRequest) {

    Product product = productService.findProduct(id);
    long lastModifiedMs = product.getUpdatedAt().toEpochMilli();

    // Returns 304 if client's If-Modified-Since >= lastModifiedMs
    if (webRequest.checkNotModified(lastModifiedMs)) {
        return null; // Spring handles 304 response
    }

    return ResponseEntity.ok()
        .lastModified(product.getUpdatedAt())
        .cacheControl(CacheControl.maxAge(Duration.ofMinutes(10)))
        .body(ProductResponse.from(product));
}

webRequest.checkNotModified() handles the comparison and sets the response status to 304 if the content hasn't changed. Returning null from the handler when checkNotModified returns true is the Spring MVC convention — the framework completes the 304 response.

Use ETag when content changes can't be captured by timestamp alone (sub-second changes, same-second multiple updates). Use Last-Modified for simpler cases where modification time is sufficient.

CDN integration

CDNs (CloudFront, Fastly, Cloudflare) sit in front of your application and cache responses per URL. They honor Cache-Control headers — a response with Cache-Control: max-age=600, public is cached at CDN edge nodes for 10 minutes, serving all subsequent requests within that window without hitting your origin.

The CDN-specific Cache-Control directives:

s-maxage — CDN TTL, independently of browser TTL:

CacheControl.maxAge(Duration.ofMinutes(1))  // browser: 1 minute
    .sMaxAge(Duration.ofHours(1))            // CDN: 1 hour
    .cachePublic()

The browser revalidates frequently (protecting users from stale data). The CDN holds the response for an hour (protecting the origin from load). Combined with stale-while-revalidate, CDN nodes can serve slightly stale content while refreshing in the background.

Surrogate keys / cache tags (Fastly, Cloudflare) — tag responses with logical identifiers and purge by tag when data changes:

@GetMapping("/products/{id}")
public ResponseEntity<ProductResponse> getProduct(@PathVariable Long id) {
    Product product = productService.findProduct(id);

    return ResponseEntity.ok()
        .header("Cache-Tag", "product-" + id + " category-" + product.getCategoryId())
        .cacheControl(CacheControl.maxAge(Duration.ofHours(1)).cachePublic())
        .body(ProductResponse.from(product));
}

// When a product changes:
@EventListener
public void onProductUpdated(ProductUpdatedEvent event) {
    fastlyClient.purgeByTag("product-" + event.getProductId());
}

Purging by tag invalidates all cached responses tagged with that identifier — without knowing the exact URL. All product pages, all category listings that contain the product, and all search results can be invalidated by product tag when the product changes.

CloudFront uses similar functionality via invalidation paths (/products/123*). Cloudflare uses cache tags with the Cache-Tag header.

Vary header — correct CDN caching for content negotiation

Vary tells CDNs that the response varies by certain request headers — the CDN should cache separate responses for different values of those headers:

@GetMapping(value = "/products/{id}", produces = {
    MediaType.APPLICATION_JSON_VALUE,
    "application/vnd.api+json"
})
public ResponseEntity<ProductResponse> getProduct(@PathVariable Long id) {
    return ResponseEntity.ok()
        .vary("Accept")  // different Accept headers get different cache entries
        .cacheControl(CacheControl.maxAge(Duration.ofMinutes(10)).cachePublic())
        .body(ProductResponse.from(productService.findProduct(id)));
}

Without Vary: Accept, a CDN might serve a JSON response to a client requesting application/vnd.api+json. With it, separate cache entries exist per Accept value.

Vary: Authorization is a trap — effectively disables CDN caching because every user has a different Authorization header, producing unique cache entries that are never reused. For user-specific endpoints, use Cache-Control: private instead of Vary: Authorization.

What to cache and what not to

Cache at the HTTP layer:

  • Static product catalogs, configuration endpoints, public content
  • Read-heavy endpoints with infrequent updates
  • Responses that are identical for all users (or identifiable user segments)

Don't cache at the HTTP layer:

  • User-specific data — use Cache-Control: private if at all, not CDN caching
  • Endpoints that reflect real-time state (inventory counts, live prices)
  • Financial and transactional responses — no-store
  • Endpoints whose freshness requirements are tighter than the minimum CDN TTL you'd set

The question for each endpoint: what is the acceptable staleness window? If the answer is zero — "users must always see the latest data" — HTTP caching is wrong for this endpoint. If the answer is "5 minutes is fine," max-age=300 is appropriate.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

When Should You Actually Break Your Spring Boot App into Microservices

The decision to extract a microservice is an engineering tradeoff, not an architectural rite of passage. Here is how to tell the difference between a legitimate reason and a rationalization.

Read more

Java Streams Are Lazy — What That Means for Performance and Correctness

Stream intermediate operations do not execute until a terminal operation is called. This laziness enables short-circuiting, infinite streams, and fusion optimizations — and causes correctness bugs when side effects are assumed to have already fired.

Read more

The Hidden Benefits of WFH for Engineering Teams

Work-from-home isn’t just convenient—it can transform how engineering teams operate. Beyond flexibility, WFH offers subtle benefits that boost productivity, creativity, and retention.

Read more

Auckland Keeps Losing Its Best Backend Developers to Sydney and London — Here Is How Startups Adapt

Your best backend engineer just told you she's moving to Melbourne. The one before her went to London. You're running out of people to lose.

Read more