Configuring Spring Boot for Docker and Kubernetes — Health Probes, Graceful Shutdown, and Resource Limits

February 25, 2026

by Arif Ikhsanudin, Backend Developer

The gap between "runs in Docker" and "works in Kubernetes"

Running a Spring Boot application in Docker is straightforward. Running it correctly in Kubernetes requires specific configuration for each of these scenarios:

Pod startup: Kubernetes must know when the application is ready to receive traffic
Pod shutdown: Kubernetes must allow in-flight requests to complete before termination
Pod replacement: new pods must become ready before old ones are removed
Resource pressure: the JVM must stay within container memory limits
Configuration: secrets and config maps must override application properties

Each scenario has a specific Spring Boot or JVM configuration that determines correct behavior.

Dockerfile — the container image

A production-appropriate Spring Boot Dockerfile:

# Build stage
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /app
COPY .mvn/ .mvn/
COPY mvnw pom.xml ./
RUN ./mvnw dependency:go-offline -q
COPY src/ src/
RUN ./mvnw package -DskipTests -q

# Runtime stage
FROM eclipse-temurin:21-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app

# Copy layered JAR contents for better Docker layer caching
COPY --from=builder /app/target/dependency/ ./dependency/
COPY --from=builder /app/target/snapshot-dependencies/ ./snapshot-dependencies/
COPY --from=builder /app/target/spring-boot-loader/ ./spring-boot-loader/
COPY --from=builder /app/target/application/ ./application/

USER appuser
EXPOSE 8080 8081

ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-Djava.security.egd=file:/dev/./urandom", \
  "org.springframework.boot.loader.launch.JarLauncher"]

Key decisions:

Multi-stage build. The build stage uses the JDK; the runtime stage uses the JRE — significantly smaller final image. Builder dependencies (Maven, test libraries) don't end up in the production image.

Non-root user. adduser appuser runs the application as a non-root user — required by most Kubernetes security policies and a security best practice.

Layered JARs. Spring Boot's layered JAR format separates the JAR into layers ordered by how frequently they change: dependencies (infrequently), snapshot dependencies (occasionally), loader (rarely), application code (frequently). Docker caches each layer. A code change only rebuilds the application layer — dependency layers are cached from the previous build. Build time drops significantly for large applications.

Enable layered JARs in pom.xml:

<plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
    <configuration>
        <layers>
            <enabled>true</enabled>
        </layers>
    </configuration>
</plugin>

Extract layers during build: ./mvnw spring-boot:build-image or java -Djarmode=layertools -jar app.jar extract.

-XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0. Without container support, the JVM reads the host's total RAM for heap sizing — a pod in a container with 1GB memory limit on a 64GB node sizes the heap based on 64GB. UseContainerSupport (default in Java 11+) makes the JVM read from cgroup memory limits. MaxRAMPercentage=75.0 sets the heap to 75% of the container's memory limit — leaving 25% for Metaspace, thread stacks, native memory, and OS buffers.

/dev/./urandom prevents the JVM from blocking on /dev/random for entropy during startup — common in containers where hardware entropy is limited.

Health probes — startup, liveness, and readiness

Spring Boot 2.3+ supports three distinct health probe endpoints:

management:
  endpoint:
    health:
      probes:
        enabled: true
      group:
        liveness:
          include: livenessState
        readiness:
          include: readinessState, db, redis
  health:
    livenessstate:
      enabled: true
    readinessstate:
      enabled: true

This creates three endpoints:

/actuator/health/liveness — is the application running?
/actuator/health/readiness — is the application ready for traffic?
/actuator/health — overall health (for monitoring)

Startup probe prevents liveness failures during slow startup:

# Kubernetes deployment.yaml
startupProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8081
  failureThreshold: 30      # 30 attempts × 10s = 5 minutes for startup
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8081
  initialDelaySeconds: 0    # startup probe handles initial delay
  periodSeconds: 10
  failureThreshold: 3       # restart after 3 consecutive failures

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8081
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 3       # stop routing traffic after 3 consecutive failures

The three-probe combination handles slow startup (startup probe), unhealthy running application (liveness probe), and temporarily unavailable dependencies (readiness probe) as separate concerns. Without the startup probe, a slow-starting application fails the liveness probe and is restarted in a loop — the classic Spring Boot on Kubernetes failure mode.

Liveness vs readiness — the critical distinction:

Liveness failure → Kubernetes restarts the pod. Use only for unrecoverable states: the application is deadlocked, the JVM has run out of memory, the application is in a permanently broken state. A database being unavailable should never fail liveness — the pod doesn't need a restart, it needs the database to recover.

Readiness failure → Kubernetes stops routing traffic to the pod. Use for any dependency that's temporarily unavailable: database connection pool exhausted, Redis unreachable, an upstream service timing out. When the dependency recovers, the readiness probe returns to success and traffic resumes.

Programmatic control of readiness state:

@Component
public class MaintenanceController {

    private final ApplicationContext applicationContext;

    @PostMapping("/admin/maintenance/start")
    public void startMaintenance() {
        // Mark the pod as not ready — Kubernetes stops routing traffic
        AvailabilityChangeEvent.publish(applicationContext,
            ReadinessState.REFUSING_TRAFFIC);
    }

    @PostMapping("/admin/maintenance/end")
    public void endMaintenance() {
        AvailabilityChangeEvent.publish(applicationContext,
            ReadinessState.ACCEPTING_TRAFFIC);
    }
}

This enables zero-traffic maintenance windows without pod restarts.

Graceful shutdown

Without graceful shutdown, Kubernetes terminates a pod and all in-flight requests fail with a connection reset. Spring Boot 2.3+ provides built-in graceful shutdown:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

When Kubernetes sends SIGTERM (the termination signal), Spring Boot:

Stops accepting new requests
Waits for in-flight requests to complete (up to timeout-per-shutdown-phase)
Executes @PreDestroy and DisposableBean.destroy() callbacks
Shuts down thread pools, closes database connections
Exits

The Kubernetes termination grace period must be longer than the Spring Boot shutdown timeout:

# deployment.yaml
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60  # Kubernetes waits up to 60s for pod to exit

# application.yaml
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s  # Spring Boot completes in-flight requests within 30s

The sequence: Kubernetes sends SIGTERM → Spring Boot stops accepting requests and starts draining → Spring Boot exits within 30s → Kubernetes confirms exit. If Spring Boot exceeds terminationGracePeriodSeconds, Kubernetes sends SIGKILL — the application is killed immediately with no cleanup.

The preStop hook gap. Kubernetes simultaneously sends SIGTERM and removes the pod from the service endpoints — but the removal propagates asynchronously through kube-proxy and load balancers. For a brief window (typically 5–15 seconds), requests may still arrive at a pod that has stopped accepting connections.

The fix: a preStop hook that pauses before the pod begins shutting down:

# deployment.yaml
spec:
  template:
    spec:
      containers:
        - name: app
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "15"]  # wait for endpoint removal to propagate

The preStop hook runs before SIGTERM is sent. The pod waits 15 seconds, then receives SIGTERM and begins graceful shutdown. This 15-second buffer allows load balancer endpoint removal to propagate before the pod stops accepting connections.

Adjust terminationGracePeriodSeconds to account for the preStop duration:

terminationGracePeriodSeconds: 60  # 15s preStop + 30s shutdown + 15s buffer

JVM resource configuration for containers

Memory. MaxRAMPercentage=75.0 is the starting point. Verify the actual memory breakdown:

# Inside the container
java -XX:+PrintFlagsFinal -version 2>&1 | grep -E "MaxHeapSize|InitialHeapSize"

Monitor container memory usage vs the limit in production. If the container regularly approaches the limit, either increase the limit or reduce MaxRAMPercentage. OOM kills (container memory limit exceeded) appear in kubectl describe pod as OOMKilled exit code 137.

CPU. The JVM calibrates thread pool sizes and JIT compilation threads based on available CPUs. In containers with fractional CPU limits (resources.limits.cpu: 0.5), the JVM sees 1 CPU from Runtime.getRuntime().availableProcessors() in Java 11+ with container support. For very low CPU limits (< 1 CPU), explicit thread configuration may be needed:

-Djdk.virtualThreadScheduler.parallelism=2   # for virtual threads
-XX:ActiveProcessorCount=2                   # explicit override

Virtual threads (Java 21+) in Kubernetes. Virtual threads with Spring Boot require no additional configuration. Enable as covered in the virtual threads article:

spring.threads.virtual.enabled=true

The JVM scheduler maps virtual threads to the available carrier threads. In containers with limited CPU, carrier thread count adjusts automatically.

Externalized configuration

Kubernetes provides configuration through ConfigMaps and Secrets. Spring Boot's externalized configuration picks these up via environment variables and mounted files:

Environment variables from ConfigMap:

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: order-service-config
data:
  SPRING_DATASOURCE_HIKARI_MAXIMUM_POOL_SIZE: "20"
  SPRING_JPA_HIBERNATE_DDL_AUTO: "validate"
  MANAGEMENT_SERVER_PORT: "8081"

---
# deployment.yaml
spec:
  containers:
    - env:
        - name: SPRING_PROFILES_ACTIVE
          value: production
        envFrom:
          - configMapRef:
              name: order-service-config

Spring Boot converts environment variables to properties using relaxed binding: SPRING_DATASOURCE_HIKARI_MAXIMUM_POOL_SIZE maps to spring.datasource.hikari.maximum-pool-size.

Secrets for credentials:

# secret.yaml (created via kubectl create secret or sealed-secrets)
apiVersion: v1
kind: Secret
metadata:
  name: order-service-secrets
type: Opaque
data:
  DATABASE_URL: <base64-encoded>
  STRIPE_API_KEY: <base64-encoded>

---
# deployment.yaml
spec:
  containers:
    - envFrom:
        - secretRef:
            name: order-service-secrets

Secrets injected as environment variables are accessible to the application as DATABASE_URL and STRIPE_API_KEY. Reference in application.yaml:

spring:
  datasource:
    url: ${DATABASE_URL}

Mounted secret files for large secrets or certificates:

# deployment.yaml
spec:
  volumes:
    - name: tls-certs
      secret:
        secretName: order-service-tls
  containers:
    - volumeMounts:
        - name: tls-certs
          mountPath: /etc/ssl/certs/app
          readOnly: true

Spring Boot can reference the mounted file path in configuration:

server:
  ssl:
    key-store: /etc/ssl/certs/app/keystore.p12
    key-store-password: ${TLS_KEYSTORE_PASSWORD}

The complete Kubernetes deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0      # never reduce capacity during rollout
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: order-service
          image: order-service:1.4.2
          ports:
            - containerPort: 8080   # application
            - containerPort: 8081   # management
          env:
            - name: SPRING_PROFILES_ACTIVE
              value: production
            - name: JAVA_OPTS
              value: "-XX:MaxRAMPercentage=75.0"
          envFrom:
            - configMapRef:
                name: order-service-config
            - secretRef:
                name: order-service-secrets
          resources:
            requests:
              memory: 512Mi
              cpu: 250m
            limits:
              memory: 1Gi
              cpu: 1000m
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "15"]
          startupProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8081
            failureThreshold: 30
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8081
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8081
            periodSeconds: 5
            failureThreshold: 3

maxUnavailable: 0 with maxSurge: 1 means rolling updates add one new pod before removing any old ones — capacity never decreases during rollout. Combined with graceful shutdown and the preStop hook, requests during pod replacement are handled without errors.

resources.requests tells Kubernetes how much to reserve when scheduling the pod. resources.limits caps what the pod can use. Setting requests equal to limits (Guaranteed QoS class) prevents resource starvation during node pressure — Kubernetes won't evict a Guaranteed pod until it exceeds its limits.

Our offices

Follow us

Configuring Spring Boot for Docker and Kubernetes — Health Probes, Graceful Shutdown, and Resource Limits

The gap between "runs in Docker" and "works in Kubernetes"

Dockerfile — the container image

Health probes — startup, liveness, and readiness

Graceful shutdown

JVM resource configuration for containers

Externalized configuration

The complete Kubernetes deployment

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Mocking Everything in Your Tests Is a Sign Something Is Wrong

The Difference Between a Revision and a New Requirement

Why Your Database Gets Slower as Your Table Gets Bigger

The Problem With Always Reaching for the Latest Technology