Spring Boot Logging in Production — Structured Logs, Correlation IDs, and What to Alert On
by Eric Hanson, Backend Developer at Clean Systems Consulting
Why unstructured logs fail at scale
A log line like 2026-04-17 14:30:45 ERROR OrderService - Failed to process order 123 for user alice@example.com contains useful information but is practically unqueryable at scale. Extracting it requires regex parsing that's fragile to any change in the log format. Alerting on error rate requires counting lines that match a pattern — slow and brittle.
Structured logging emits logs as JSON objects where every field is a named key-value pair:
{
"timestamp": "2026-04-17T14:30:45.123Z",
"level": "ERROR",
"logger": "com.example.OrderService",
"message": "Failed to process order",
"orderId": "123",
"userEmail": "alice@example.com",
"errorType": "PaymentDeclinedException",
"traceId": "abc123def456",
"spanId": "789xyz",
"environment": "production",
"service": "order-service"
}
Every field is indexable. Querying "ERROR logs for order 123 in the last hour" is a log aggregator query, not a regex. Alerting on error rate is count(level=ERROR) / count(*) — precise and fast.
Logback with JSON output
Spring Boot uses Logback by default. Configure structured JSON output for production:
<!-- src/main/resources/logback-spring.xml -->
<configuration>
<!-- Development profile: human-readable -->
<springProfile name="!production">
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="DEBUG">
<appender-ref ref="CONSOLE"/>
</root>
</springProfile>
<!-- Production profile: structured JSON -->
<springProfile name="production">
<appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<includeCallerData>false</includeCallerData>
<includeMdcKeyName>traceId</includeMdcKeyName>
<includeMdcKeyName>spanId</includeMdcKeyName>
<includeMdcKeyName>requestId</includeMdcKeyName>
<includeMdcKeyName>userId</includeMdcKeyName>
<includeMdcKeyName>tenantId</includeMdcKeyName>
<customFields>{"service":"order-service"}</customFields>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="JSON"/>
</root>
<!-- Reduce noisy framework logs -->
<logger name="org.hibernate.SQL" level="WARN"/>
<logger name="com.zaxxer.hikari" level="WARN"/>
<logger name="org.springframework.web" level="WARN"/>
</springProfile>
</configuration>
The dependency:
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>7.4</version>
</dependency>
LogstashEncoder outputs each log event as a single-line JSON object — one line per event, newline-delimited. Log aggregators (Datadog, Splunk, ELK, CloudWatch Logs Insights) parse these directly without regex.
includeCallerData>false</includeCallerData> disables caller class and method name resolution — expensive for high-throughput services. Enable it only when actively debugging.
MDC — the context that travels with every log line
MDC (Mapped Diagnostic Context) is a per-thread map of key-value pairs automatically included in every log line. Set contextual values at the request boundary; they appear in all downstream log lines:
@Component
public class RequestLoggingFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response, FilterChain chain)
throws ServletException, IOException {
String requestId = Optional.ofNullable(request.getHeader("X-Request-ID"))
.orElse(UUID.randomUUID().toString());
try {
MDC.put("requestId", requestId);
MDC.put("method", request.getMethod());
MDC.put("path", request.getRequestURI());
// Set user context after security filter has run
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth != null && auth.isAuthenticated() &&
!(auth instanceof AnonymousAuthenticationToken)) {
MDC.put("userId", auth.getName());
}
response.addHeader("X-Request-ID", requestId);
chain.doFilter(request, response);
} finally {
MDC.clear(); // mandatory — threads are reused
}
}
}
With MDC set, every log line from any class in the request thread includes requestId, userId, and path without any of those classes needing to pass them explicitly. A log line from PaymentGatewayClient deep in the call stack includes the same requestId as the controller log line — allowing reconstruction of the full request flow.
Correlation IDs across services
MDC is thread-local. When work crosses service boundaries (HTTP calls, message queue processing), the correlation ID must be propagated explicitly.
Outgoing HTTP calls — propagate the trace ID:
@Bean
public WebClient orderServiceClient(ObservationRegistry observationRegistry) {
return WebClient.builder()
.baseUrl(orderServiceUrl)
.filter(propagateCorrelationId())
.build();
}
private ExchangeFilterFunction propagateCorrelationId() {
return ExchangeFilterFunction.ofRequestProcessor(request -> {
String traceId = MDC.get("traceId");
String requestId = MDC.get("requestId");
ClientRequest.Builder builder = ClientRequest.from(request);
if (traceId != null) builder.header("X-B3-TraceId", traceId);
if (requestId != null) builder.header("X-Request-ID", requestId);
return Mono.just(builder.build());
});
}
Incoming HTTP calls — extract and restore the trace ID:
// In RequestLoggingFilter, check for incoming correlation headers
String incomingTraceId = request.getHeader("X-B3-TraceId");
if (incomingTraceId != null) {
MDC.put("traceId", incomingTraceId);
} else {
MDC.put("traceId", UUID.randomUUID().toString());
}
Micrometer Tracing (Spring Boot 3.x) automates this. With spring-boot-starter-actuator and a Micrometer Tracing bridge configured, trace IDs propagate automatically through WebClient, RestTemplate, and message listeners. The MDC is populated from the current span automatically — traceId and spanId appear in logs without manual propagation.
management:
tracing:
sampling:
probability: 1.0 # sample 100% in development, 0.1 in production
With Micrometer Tracing, the manual propagation above is replaced by the framework. The tracing filter, MDC population, and header propagation all happen automatically.
Propagating MDC through async boundaries
MDC is thread-local — when work moves to a different thread, the MDC is not automatically copied. With @Async, Kafka consumers, and virtual threads, this causes MDC to be empty in log lines from the new thread:
@Async
public void processAsync(String orderId) {
// MDC is empty here — different thread
log.info("Processing order {}", orderId); // no traceId in this log line
}
Fix: copy MDC values at the point of thread handoff:
@Configuration
public class AsyncConfig {
@Bean
public TaskDecorator mdcTaskDecorator() {
return runnable -> {
Map<String, String> mdcCopy = MDC.getCopyOfContextMap();
return () -> {
try {
if (mdcCopy != null) MDC.setContextMap(mdcCopy);
runnable.run();
} finally {
MDC.clear();
}
};
};
}
@Bean
public Executor asyncExecutor(TaskDecorator mdcTaskDecorator) {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setTaskDecorator(mdcTaskDecorator);
executor.setCorePoolSize(5);
executor.setMaxPoolSize(20);
executor.initialize();
return executor;
}
}
MDC.getCopyOfContextMap() captures the current MDC at submission time. The TaskDecorator restores it before the runnable executes on the pool thread.
For Kafka consumers, restore MDC from the message headers:
@KafkaListener(topics = "orders.placed")
public void handleOrderPlaced(ConsumerRecord<String, OrderPlacedEvent> record) {
String traceId = extractHeader(record, "X-Trace-ID");
try {
MDC.put("traceId", traceId != null ? traceId : UUID.randomUUID().toString());
MDC.put("topic", record.topic());
MDC.put("partition", String.valueOf(record.partition()));
MDC.put("offset", String.valueOf(record.offset()));
processEvent(record.value());
} finally {
MDC.clear();
}
}
Log levels — the discipline that reduces noise
ERROR: a condition that requires immediate human attention. A database is down. An uncaught exception reached the top of the call stack. A critical business operation failed permanently.
WARN: a condition that is unexpected but handled. A retry succeeded after initial failure. A configuration value is missing but a default was used. A deprecated code path was called.
INFO: key business events and state transitions. Order created. Payment processed. User logged in. Service started successfully. These should be auditable.
DEBUG: detailed technical information useful during development. SQL queries, HTTP request/response bodies, cache hit/miss decisions. Should be disabled in production.
TRACE: extremely verbose — method entry/exit, loop iterations. Almost never appropriate in production.
The common failure: everything at DEBUG in production because "we want to see what's happening." The result is gigabytes of log volume per hour, $thousands in log storage costs, and real errors buried in noise. Use DEBUG only when actively diagnosing a specific issue; switch it off immediately after.
// Expensive logging — construct string only if DEBUG is enabled
if (log.isDebugEnabled()) {
log.debug("Cache miss for key {}, loading from database", generateKey(params));
}
// SLF4J parameterized logging — constructs string only if level is enabled
log.debug("Request completed: method={}, path={}, status={}, duration={}ms",
method, path, status, duration);
// WRONG — always constructs the string regardless of log level
log.debug("Request: " + method + " " + path + " " + status);
What to alert on
Logs generate too much data to alert on every event. Structure alerts around signals, not log lines:
Immediately page:
- ERROR log rate above baseline (e.g., > 1% of requests produce ERROR logs)
- Specific error types that indicate security incidents: authentication failures exceeding threshold, authorization failures for admin endpoints
- Any
OutOfMemoryErrororStackOverflowErrorin logs
Alert but don't page immediately:
- WARN log rate increase (2x baseline sustained for 5 minutes)
- Specific known degradation signals: circuit breaker WARN logs, retry WARN logs
- Slow query WARN logs from Hibernate/HikariCP
Don't alert on:
- Individual ERROR log lines — too noisy, alert on rates
- DEBUG or INFO level — not alert-worthy by definition
- Known expected errors (404 not found, 401 unauthorized) without rate anomaly
Configure your log aggregator to create metrics from logs, then alert on metrics:
# Datadog log-based metric example
count of events where level=ERROR and service=order-service
# Alert condition
sum(last 5m):count:log_lines{level:error,service:order-service} > 50
Alerting on a count rather than individual lines means a single ERROR during an otherwise quiet period doesn't page anyone — only sustained elevated error rates do. This is the distinction between signal and noise.
The logging checklist for a new service
Before going to production:
- JSON structured output configured for the production profile
- MDC populated at request entry with
requestId,userId, and trace context - MDC cleared in
finallyat every thread boundary - Micrometer Tracing configured if using distributed tracing
TaskDecoratorwrapping async executors to propagate MDC- Log level at INFO for application code, WARN for framework code
- ERROR logs trigger an alert in monitoring
X-Request-IDreturned in response headers for client-side debugging- No DEBUG logging left permanently enabled from a debugging session