Spring Boot Request Processing Overhead — Filter Chains, Serialization, and What's Worth Measuring
by Eric Hanson, Backend Developer at Clean Systems Consulting
The layers a request passes through
Before your controller method runs and after it returns, every Spring Boot request passes through:
- Tomcat connector — thread selection, socket I/O, HTTP parsing
- Servlet filter chain — Spring Security, CORS, request logging, tracing, compression, custom filters
DispatcherServlet— handler mapping, content negotiation- HandlerInterceptors — pre/post processing hooks
- Argument resolution —
@RequestBodydeserialization,@PathVariableextraction,@AuthenticationPrincipalresolution - Controller execution — your code
- Return value handling — response serialization,
@ResponseBodyprocessing - Filter chain post-processing — response headers, tracing completion
For a service where requests take 50ms on average, 48ms of that is probably the database and 2ms is everything above. Optimizing the filter chain saves fractions of a millisecond. For a service where requests are inherently fast (sub-millisecond computations, mostly cached data), the 2ms overhead is the dominant cost.
Profile before optimizing. async-profiler with -e cpu on a service under load shows which of these layers consumes meaningful CPU. Without measurement, you're guessing.
Filter chain overhead
Each servlet filter in the chain processes every request — pre-processing before the controller and post-processing after the response. The number of filters and their individual cost determines the chain overhead.
Inspect the filter chain:
@Component
public class FilterChainLogger implements ApplicationContextAware {
@Override
public void setApplicationContext(ApplicationContext ctx) {
FilterChainProxy securityFilterChain = ctx.getBean(FilterChainProxy.class);
securityFilterChain.getFilterChains().forEach(chain -> {
log.info("Security filter chain: {}", chain.getFilters().stream()
.map(f -> f.getClass().getSimpleName())
.collect(Collectors.joining(" -> ")));
});
}
}
Or inspect via Actuator:
curl http://localhost:8080/actuator/mappings | jq '.contexts.application.mappings.dispatcherServlets'
A default Spring Boot application with Spring Security has 15–20 security filters, each executing on every request. Most are lightweight. A few warrant attention:
SessionManagementFilter — for stateless REST APIs, Spring Security's session management adds overhead maintaining session state that's never used. Disable it explicitly for stateless APIs:
@Configuration
public class SecurityConfig {
@Bean
public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
return http
.sessionManagement(session ->
session.sessionCreationPolicy(SessionCreationPolicy.STATELESS))
.csrf(csrf -> csrf.disable()) // stateless APIs don't need CSRF
.build();
}
}
STATELESS policy prevents session creation and disables session-related filters. For a REST API that uses JWT or API keys, this is correct configuration regardless of performance.
CsrfFilter — computes and validates CSRF tokens. Disabled for stateless APIs (above). For server-rendered apps that need CSRF, the overhead is inherent.
Custom filters that do I/O. A request logging filter that writes to a database or a filter that validates API keys against Redis adds I/O latency to every request before the controller runs. Evaluate whether the validation belongs in the filter chain or in a Spring Security AuthenticationProvider where it can be cached.
Measuring filter chain cost
Add timing to identify which filters are expensive:
@Component
@Order(Ordered.HIGHEST_PRECEDENCE) // runs first
public class RequestTimingFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response, FilterChain chain)
throws ServletException, IOException {
long start = System.nanoTime();
try {
chain.doFilter(request, response);
} finally {
long elapsed = System.nanoTime() - start;
// Entire filter chain + controller + response writing
log.debug("Request {} {} completed in {}ms",
request.getMethod(), request.getRequestURI(),
TimeUnit.NANOSECONDS.toMillis(elapsed));
}
}
}
Place this filter at the highest precedence (runs before all others) — it measures total request processing time including all other filters. Compare this against the time recorded closer to the controller to isolate filter chain overhead.
Micrometer's http.server.requests metric measures from DispatcherServlet entry to response completion — it excludes most servlet filter overhead. If the p99 latency in your APM is much higher than http.server.requests p99, the delta is filter chain overhead.
Argument resolution — @RequestBody deserialization
@RequestBody deserializes the request body using Jackson. For large request bodies or complex object graphs, this is measurable overhead.
Jackson's deserialization is fast for simple types — a few microseconds for a small JSON object. It becomes significant for:
Large arrays. A request body with 10,000 items being deserialized into List<Order> allocates one Order object per item, plus the intermediate JSON tokens. For bulk API endpoints, streaming deserialization (reading the JSON stream without building the full list in memory) is more efficient:
@PostMapping("/orders/bulk")
public ResponseEntity<BulkResult> bulkCreateOrders(HttpServletRequest request)
throws IOException {
try (JsonParser parser = objectMapper.getFactory()
.createParser(request.getInputStream())) {
// Read and process each order as it's parsed — no full list in memory
MappingIterator<CreateOrderRequest> orders =
objectMapper.readValues(parser, CreateOrderRequest.class);
BulkResult result = new BulkResult();
while (orders.hasNext()) {
CreateOrderRequest order = orders.next();
result.add(orderService.createOrder(order));
}
return ResponseEntity.ok(result);
}
}
Unknown field scanning. With DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES enabled (not the Spring Boot default), Jackson inspects every field in the JSON against the target type's known properties. For JSON with many unknown fields, this adds overhead. Spring Boot disables this by default — verify it's not re-enabled somewhere.
Response serialization overhead
@ResponseBody serializes the return value to JSON using Jackson. For endpoints returning large responses, serialization is significant:
Measure serialization cost in isolation:
@GetMapping("/orders")
public List<OrderSummary> listOrders() {
long queryStart = System.nanoTime();
List<OrderSummary> orders = orderService.findOrders();
long queryEnd = System.nanoTime();
// Serialization happens after this method returns in return value handling
// Use @ResponseBody with a ResponseBodyAdvice to measure serialization separately
log.debug("Query: {}ms, count: {}",
TimeUnit.NANOSECONDS.toMillis(queryEnd - queryStart), orders.size());
return orders;
}
Streaming for large responses. StreamingResponseBody writes the response incrementally, releasing the Tomcat thread while writing. For very large responses, this prevents thread pool exhaustion during slow network writes:
@GetMapping(value = "/orders/export", produces = MediaType.APPLICATION_JSON_VALUE)
public StreamingResponseBody exportOrders() {
return outputStream -> {
try (JsonGenerator generator = objectMapper.getFactory()
.createGenerator(outputStream)) {
generator.writeStartArray();
orderRepository.streamAll().forEach(order -> {
try {
objectMapper.writeValue(generator, OrderExportRow.from(order));
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
generator.writeEndArray();
}
};
}
StreamingResponseBody runs on a different thread pool — the Tomcat request thread is released immediately, allowing new requests to be accepted while the response is being written.
Response compression
HTTP response compression (gzip/Brotli) reduces network transfer at the cost of CPU. Spring Boot's embedded Tomcat compresses responses when configured:
server:
compression:
enabled: true
mime-types: application/json, application/xml, text/html, text/plain
min-response-size: 2048 # only compress responses larger than 2KB
Compression is worth enabling for JSON API responses over 2KB — JSON compresses well (60–80% size reduction typical). For small responses (< 1KB), compression overhead exceeds the network savings. The min-response-size threshold handles this automatically.
CPU cost of gzip: roughly 1–3ms per response for a 10KB JSON payload on modern hardware. For a service handling 10,000 requests per second, this is 10,000–30,000ms of CPU per second — meaningful at scale. Profile whether compression is CPU-bound at your traffic levels before enabling it.
HTTP/2 and connection multiplexing
HTTP/2 is worth enabling for APIs consumed by browsers or clients that make multiple concurrent requests to the same host. Multiplexing multiple requests over a single TCP connection reduces connection establishment overhead:
server:
http2:
enabled: true
For most backend-to-backend API calls, HTTP/2's benefit is minimal — the client typically makes sequential requests or maintains a persistent connection pool anyway. For public APIs consumed by many different clients, HTTP/2 reduces connection overhead significantly.
What's actually worth optimizing
The filter chain, argument resolution, and serialization together typically add 1–5ms to requests where the database takes 20–100ms. Optimizing them saves a small percentage of total latency.
The cases where these layers dominate:
- Endpoints that return cached data (database time is near zero — framework overhead is all that's left)
- Validation-heavy endpoints where the request body is large and complex
- High-frequency lightweight operations (heartbeat endpoints, metrics endpoints, health checks)
For these cases, the optimizations above — stateless session policy, streaming for large bodies, response compression, HTTP/2 — are worth applying. For endpoints that are database-bound, the same effort applied to query optimization returns more.
The measurement that tells you where to invest: compare http.server.requests duration (DispatcherServlet to response) against total request duration in your load balancer or APM. A large gap indicates filter chain overhead worth investigating. A small gap confirms that the framework layers are not the bottleneck.