Production-Ready Spring Boot — The Observability Setup That Catches Problems Before Users Do

March 14, 2026

by Arif Ikhsanudin, Backend Developer

The gap between running and observable

An application that starts and responds to requests is running. An application where you can answer "is it healthy?", "what is it doing right now?", "what happened five minutes ago when that error spiked?", and "which service caused that slow request?" — that's observable.

Most Spring Boot setups have the first. Getting to the second requires deliberate configuration of four things: health indicators, structured logs, metrics, and distributed traces. Spring Boot's ecosystem covers all four; the defaults cover only some of them.

Health checks — what to expose and what to hide

Spring Boot Actuator provides /actuator/health out of the box. The default configuration exposes an aggregate status — UP, DOWN, OUT_OF_SERVICE, UNKNOWN — and hides the detail.

Configure what to expose at each endpoint:

management:
  endpoints:
    web:
      exposure:
        include: health, info, metrics, prometheus
  endpoint:
    health:
      show-details: when-authorized  # or 'always' for internal services
      show-components: when-authorized
  health:
    db:
      enabled: true
    redis:
      enabled: true
    diskspace:
      enabled: true
      threshold: 524288000  # 500MB minimum free space

Never expose all actuator endpoints publicly. /actuator/env exposes environment variables including secrets. /actuator/loggers allows changing log levels at runtime — useful in production but only for authorized users. /actuator/heapdump triggers a heap dump — a denial-of-service vector if exposed publicly. Expose health, info, metrics, and prometheus publicly; gate everything else behind authentication.

Custom health indicators for critical dependencies your application needs to function:

@Component
public class PaymentGatewayHealthIndicator implements HealthIndicator {
    private final PaymentGatewayClient client;

    public PaymentGatewayHealthIndicator(PaymentGatewayClient client) {
        this.client = client;
    }

    @Override
    public Health health() {
        try {
            boolean reachable = client.ping();
            if (reachable) {
                return Health.up()
                    .withDetail("gateway", "stripe")
                    .withDetail("latency_ms", client.lastPingLatencyMs())
                    .build();
            }
            return Health.down()
                .withDetail("gateway", "stripe")
                .withDetail("reason", "ping failed")
                .build();
        } catch (Exception e) {
            return Health.down(e).build();
        }
    }
}

The health endpoint is the contract your load balancer and orchestration platform uses. Kubernetes liveness and readiness probes should target separate endpoints:

management:
  endpoint:
    health:
      probes:
        enabled: true
# Exposes /actuator/health/liveness and /actuator/health/readiness

Liveness: is the application alive? If not, restart it. Should only fail for unrecoverable states — deadlock, corrupted state — not for external dependency failures.

Readiness: is the application ready to accept traffic? Should fail if critical dependencies (database, message queue) are unavailable. External dependency health indicators belong in the readiness group:

@Component
public class DatabaseReadinessIndicator implements HealthIndicator {
    // ... check database connectivity
}

// Register as readiness probe
@Bean
public HealthContributorRegistry readinessHealthContributors(
        DatabaseReadinessIndicator dbIndicator) {
    // Spring Boot auto-configures this; use @ReadinessProbe annotation
    return ...;
}

In Spring Boot 2.3+, annotate components with @Liveness or @Readiness or use the application properties to configure which indicators contribute to each probe.

Structured logging

Unstructured log lines — 2024-01-15 ERROR OrderService: Failed to process order 123 — require regex parsing to extract fields for alerting and querying. Structured logs emit JSON or key-value pairs that log aggregators can index directly.

Configure Logback for JSON output with Logstash encoder:

<!-- logback-spring.xml -->
<configuration>
  <springProfile name="production">
    <appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
      <encoder class="net.logstash.logback.encoder.LogstashEncoder">
        <includeMdcKeyName>traceId</includeMdcKeyName>
        <includeMdcKeyName>spanId</includeMdcKeyName>
        <includeMdcKeyName>userId</includeMdcKeyName>
        <includeMdcKeyName>requestId</includeMdcKeyName>
      </encoder>
    </appender>
    <root level="INFO">
      <appender-ref ref="JSON" />
    </root>
  </springProfile>
</configuration>

MDC (Mapped Diagnostic Context) fields are injected per-request and appear in every log line for that request. Populate MDC at request entry:

@Component
public class LoggingFilter extends OncePerRequestFilter {
    @Override
    protected void doFilterInternal(HttpServletRequest request,
            HttpServletResponse response, FilterChain chain)
            throws ServletException, IOException {
        try {
            MDC.put("requestId", UUID.randomUUID().toString());
            MDC.put("method", request.getMethod());
            MDC.put("path", request.getRequestURI());
            chain.doFilter(request, response);
        } finally {
            MDC.clear(); // always clear — virtual threads and thread pools share threads
        }
    }
}

MDC.clear() in finally is critical. In thread pool environments, threads are reused — MDC from a previous request will leak into the next request on the same thread if not cleared.

Log level discipline. Production logs at INFO. DEBUG logs are noise in production and CPU overhead in high-throughput services (string construction happens before the level check for non-SLF4J-parameterized calls). Set specific packages to DEBUG via Actuator when diagnosing a live issue — don't leave them there:

curl -X POST localhost:8080/actuator/loggers/com.example.payments \
  -H "Content-Type: application/json" \
  -d '{"configuredLevel": "DEBUG"}'

This changes the log level at runtime without restart. Revert after diagnosis.

Metrics with Micrometer

Micrometer is Spring Boot's metrics facade — it exposes metrics in any format (Prometheus, Datadog, CloudWatch, InfluxDB) via a pluggable registry. Spring Boot auto-configures JVM, HTTP, and database metrics:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

management:
  metrics:
    tags:
      application: ${spring.application.name}
      environment: ${spring.profiles.active}
    distribution:
      percentiles-histogram:
        http.server.requests: true  # enables histogram for percentile calculation
      percentiles:
        http.server.requests: 0.5, 0.95, 0.99

Adding application and environment tags to all metrics makes filtering in dashboards trivial.

The metrics that warrant alerts:

# HTTP
http.server.requests{status="5xx"} — error rate, alert on > 1% of requests
http.server.requests{outcome="SUCCESS", quantile="0.99"} — p99 latency, alert on > SLA

# JVM
jvm.memory.used{area="heap"} / jvm.memory.max{area="heap"} — heap usage ratio, alert on > 80%
jvm.gc.pause_seconds_max — worst GC pause, alert on > 200ms
jvm.threads.states{state="blocked"} — blocked threads, alert on sustained count > 0

# Database (HikariCP)
hikaricp.connections.pending — connection pool wait, alert on > 0 sustained
hikaricp.connections.timeout — pool timeout events, alert on any
hikaricp.connections.active / hikaricp.connections.max — pool utilization

# Application-specific
business.orders.processed.total — throughput, alert on sudden drop
business.payment.failures.total — payment failures, alert on rate increase

Custom metrics:

@Service
public class OrderService {
    private final Counter orderCounter;
    private final Timer processingTimer;

    public OrderService(MeterRegistry registry) {
        this.orderCounter = Counter.builder("business.orders.processed")
            .description("Total orders processed")
            .tag("environment", "production")
            .register(registry);

        this.processingTimer = Timer.builder("business.order.processing.duration")
            .description("Order processing duration")
            .publishPercentileHistogram()
            .register(registry);
    }

    public void processOrder(Order order) {
        processingTimer.record(() -> {
            doProcess(order);
            orderCounter.increment();
        });
    }
}

Timers with publishPercentileHistogram() enable server-side percentile calculation in Prometheus/Grafana. Without the histogram, only mean and max are computable.

Distributed tracing with Micrometer Tracing

Micrometer Tracing (Spring Boot 3.x, formerly Spring Cloud Sleuth) automatically instruments Spring MVC, Spring WebFlux, Spring Data, and messaging. Add the dependency for your tracing backend:

<!-- For Zipkin -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<dependency>
    <groupId>io.zipkin.reporter2</groupId>
    <artifactId>zipkin-reporter-brave</artifactId>
</dependency>

<!-- For OpenTelemetry (preferred for modern setups) -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>

management:
  tracing:
    sampling:
      probability: 0.1  # sample 10% of requests — adjust based on volume

With tracing configured, every HTTP request gets a traceId that propagates through all downstream service calls. The traceId appears in logs (via MDC injection), in metrics tags, and in the trace viewer (Zipkin, Jaeger, Grafana Tempo). A single traceId from a user report lets you reconstruct the entire request path across services.

Sampling rate. 100% sampling for low-volume services; 1–10% for high-volume. Store the traceId in error responses so users can report it — even at 10% sampling, errors can be sampled at 100% with a custom sampler:

@Bean
public Sampler customSampler() {
    return (traceContext, sampled) ->
        // Always sample on errors, 10% otherwise
        sampled ? Sampler.ALWAYS_SAMPLE.isSampled() : new RateLimitingSampler(10).isSampled();
}

The startup verification checklist

Before declaring a service production-ready:

/actuator/health returns UP and shows component detail for authorized requests
/actuator/health/liveness and /actuator/health/readiness exist and return correct status
Custom health indicators cover all critical external dependencies
Logs are structured JSON in production profile, unstructured in development
MDC includes requestId, traceId, and spanId in every log line
/actuator/prometheus exposes metrics and is scraped by the metrics system
HTTP error rate and p99 latency alerts are configured
HikariCP connection pool metrics are being collected
Distributed trace IDs appear in error log lines and error responses
Log level can be changed at runtime via Actuator without restart

Each item on this list represents a question you'll need to answer during an incident. Missing any of them means that question goes unanswered — at the worst possible time.

Our offices

Follow us

Production-Ready Spring Boot — The Observability Setup That Catches Problems Before Users Do

The gap between running and observable

Health checks — what to expose and what to hide

Structured logging

Metrics with Micrometer

Distributed tracing with Micrometer Tracing

The startup verification checklist

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

How Lowball Specs Destroy Project Quality

How I Handle Disagreements With Other Engineers Professionally

Lazy vs Eager Loading in JPA — What Gets Loaded and When

Synchronous Communication in Microservices Is a Trap