Configuring Spring Boot for Docker and Kubernetes — Health Probes, Graceful Shutdown, and Resource Limits

by Eric Hanson, Backend Developer at Clean Systems Consulting

The gap between "runs in Docker" and "works in Kubernetes"

Running a Spring Boot application in Docker is straightforward. Running it correctly in Kubernetes requires specific configuration for each of these scenarios:

  • Pod startup: Kubernetes must know when the application is ready to receive traffic
  • Pod shutdown: Kubernetes must allow in-flight requests to complete before termination
  • Pod replacement: new pods must become ready before old ones are removed
  • Resource pressure: the JVM must stay within container memory limits
  • Configuration: secrets and config maps must override application properties

Each scenario has a specific Spring Boot or JVM configuration that determines correct behavior.

Dockerfile — the container image

A production-appropriate Spring Boot Dockerfile:

# Build stage
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /app
COPY .mvn/ .mvn/
COPY mvnw pom.xml ./
RUN ./mvnw dependency:go-offline -q
COPY src/ src/
RUN ./mvnw package -DskipTests -q

# Runtime stage
FROM eclipse-temurin:21-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app

# Copy layered JAR contents for better Docker layer caching
COPY --from=builder /app/target/dependency/ ./dependency/
COPY --from=builder /app/target/snapshot-dependencies/ ./snapshot-dependencies/
COPY --from=builder /app/target/spring-boot-loader/ ./spring-boot-loader/
COPY --from=builder /app/target/application/ ./application/

USER appuser
EXPOSE 8080 8081

ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-Djava.security.egd=file:/dev/./urandom", \
  "org.springframework.boot.loader.launch.JarLauncher"]

Key decisions:

Multi-stage build. The build stage uses the JDK; the runtime stage uses the JRE — significantly smaller final image. Builder dependencies (Maven, test libraries) don't end up in the production image.

Non-root user. adduser appuser runs the application as a non-root user — required by most Kubernetes security policies and a security best practice.

Layered JARs. Spring Boot's layered JAR format separates the JAR into layers ordered by how frequently they change: dependencies (infrequently), snapshot dependencies (occasionally), loader (rarely), application code (frequently). Docker caches each layer. A code change only rebuilds the application layer — dependency layers are cached from the previous build. Build time drops significantly for large applications.

Enable layered JARs in pom.xml:

<plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
    <configuration>
        <layers>
            <enabled>true</enabled>
        </layers>
    </configuration>
</plugin>

Extract layers during build: ./mvnw spring-boot:build-image or java -Djarmode=layertools -jar app.jar extract.

-XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0. Without container support, the JVM reads the host's total RAM for heap sizing — a pod in a container with 1GB memory limit on a 64GB node sizes the heap based on 64GB. UseContainerSupport (default in Java 11+) makes the JVM read from cgroup memory limits. MaxRAMPercentage=75.0 sets the heap to 75% of the container's memory limit — leaving 25% for Metaspace, thread stacks, native memory, and OS buffers.

/dev/./urandom prevents the JVM from blocking on /dev/random for entropy during startup — common in containers where hardware entropy is limited.

Health probes — startup, liveness, and readiness

Spring Boot 2.3+ supports three distinct health probe endpoints:

management:
  endpoint:
    health:
      probes:
        enabled: true
      group:
        liveness:
          include: livenessState
        readiness:
          include: readinessState, db, redis
  health:
    livenessstate:
      enabled: true
    readinessstate:
      enabled: true

This creates three endpoints:

  • /actuator/health/liveness — is the application running?
  • /actuator/health/readiness — is the application ready for traffic?
  • /actuator/health — overall health (for monitoring)

Startup probe prevents liveness failures during slow startup:

# Kubernetes deployment.yaml
startupProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8081
  failureThreshold: 30      # 30 attempts × 10s = 5 minutes for startup
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8081
  initialDelaySeconds: 0    # startup probe handles initial delay
  periodSeconds: 10
  failureThreshold: 3       # restart after 3 consecutive failures

readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8081
  initialDelaySeconds: 0
  periodSeconds: 5
  failureThreshold: 3       # stop routing traffic after 3 consecutive failures

The three-probe combination handles slow startup (startup probe), unhealthy running application (liveness probe), and temporarily unavailable dependencies (readiness probe) as separate concerns. Without the startup probe, a slow-starting application fails the liveness probe and is restarted in a loop — the classic Spring Boot on Kubernetes failure mode.

Liveness vs readiness — the critical distinction:

Liveness failure → Kubernetes restarts the pod. Use only for unrecoverable states: the application is deadlocked, the JVM has run out of memory, the application is in a permanently broken state. A database being unavailable should never fail liveness — the pod doesn't need a restart, it needs the database to recover.

Readiness failure → Kubernetes stops routing traffic to the pod. Use for any dependency that's temporarily unavailable: database connection pool exhausted, Redis unreachable, an upstream service timing out. When the dependency recovers, the readiness probe returns to success and traffic resumes.

Programmatic control of readiness state:

@Component
public class MaintenanceController {

    private final ApplicationContext applicationContext;

    @PostMapping("/admin/maintenance/start")
    public void startMaintenance() {
        // Mark the pod as not ready — Kubernetes stops routing traffic
        AvailabilityChangeEvent.publish(applicationContext,
            ReadinessState.REFUSING_TRAFFIC);
    }

    @PostMapping("/admin/maintenance/end")
    public void endMaintenance() {
        AvailabilityChangeEvent.publish(applicationContext,
            ReadinessState.ACCEPTING_TRAFFIC);
    }
}

This enables zero-traffic maintenance windows without pod restarts.

Graceful shutdown

Without graceful shutdown, Kubernetes terminates a pod and all in-flight requests fail with a connection reset. Spring Boot 2.3+ provides built-in graceful shutdown:

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

When Kubernetes sends SIGTERM (the termination signal), Spring Boot:

  1. Stops accepting new requests
  2. Waits for in-flight requests to complete (up to timeout-per-shutdown-phase)
  3. Executes @PreDestroy and DisposableBean.destroy() callbacks
  4. Shuts down thread pools, closes database connections
  5. Exits

The Kubernetes termination grace period must be longer than the Spring Boot shutdown timeout:

# deployment.yaml
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60  # Kubernetes waits up to 60s for pod to exit
# application.yaml
spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s  # Spring Boot completes in-flight requests within 30s

The sequence: Kubernetes sends SIGTERM → Spring Boot stops accepting requests and starts draining → Spring Boot exits within 30s → Kubernetes confirms exit. If Spring Boot exceeds terminationGracePeriodSeconds, Kubernetes sends SIGKILL — the application is killed immediately with no cleanup.

The preStop hook gap. Kubernetes simultaneously sends SIGTERM and removes the pod from the service endpoints — but the removal propagates asynchronously through kube-proxy and load balancers. For a brief window (typically 5–15 seconds), requests may still arrive at a pod that has stopped accepting connections.

The fix: a preStop hook that pauses before the pod begins shutting down:

# deployment.yaml
spec:
  template:
    spec:
      containers:
        - name: app
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "15"]  # wait for endpoint removal to propagate

The preStop hook runs before SIGTERM is sent. The pod waits 15 seconds, then receives SIGTERM and begins graceful shutdown. This 15-second buffer allows load balancer endpoint removal to propagate before the pod stops accepting connections.

Adjust terminationGracePeriodSeconds to account for the preStop duration:

terminationGracePeriodSeconds: 60  # 15s preStop + 30s shutdown + 15s buffer

JVM resource configuration for containers

Memory. MaxRAMPercentage=75.0 is the starting point. Verify the actual memory breakdown:

# Inside the container
java -XX:+PrintFlagsFinal -version 2>&1 | grep -E "MaxHeapSize|InitialHeapSize"

Monitor container memory usage vs the limit in production. If the container regularly approaches the limit, either increase the limit or reduce MaxRAMPercentage. OOM kills (container memory limit exceeded) appear in kubectl describe pod as OOMKilled exit code 137.

CPU. The JVM calibrates thread pool sizes and JIT compilation threads based on available CPUs. In containers with fractional CPU limits (resources.limits.cpu: 0.5), the JVM sees 1 CPU from Runtime.getRuntime().availableProcessors() in Java 11+ with container support. For very low CPU limits (< 1 CPU), explicit thread configuration may be needed:

-Djdk.virtualThreadScheduler.parallelism=2   # for virtual threads
-XX:ActiveProcessorCount=2                   # explicit override

Virtual threads (Java 21+) in Kubernetes. Virtual threads with Spring Boot require no additional configuration. Enable as covered in the virtual threads article:

spring.threads.virtual.enabled=true

The JVM scheduler maps virtual threads to the available carrier threads. In containers with limited CPU, carrier thread count adjusts automatically.

Externalized configuration

Kubernetes provides configuration through ConfigMaps and Secrets. Spring Boot's externalized configuration picks these up via environment variables and mounted files:

Environment variables from ConfigMap:

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: order-service-config
data:
  SPRING_DATASOURCE_HIKARI_MAXIMUM_POOL_SIZE: "20"
  SPRING_JPA_HIBERNATE_DDL_AUTO: "validate"
  MANAGEMENT_SERVER_PORT: "8081"

---
# deployment.yaml
spec:
  containers:
    - env:
        - name: SPRING_PROFILES_ACTIVE
          value: production
        envFrom:
          - configMapRef:
              name: order-service-config

Spring Boot converts environment variables to properties using relaxed binding: SPRING_DATASOURCE_HIKARI_MAXIMUM_POOL_SIZE maps to spring.datasource.hikari.maximum-pool-size.

Secrets for credentials:

# secret.yaml (created via kubectl create secret or sealed-secrets)
apiVersion: v1
kind: Secret
metadata:
  name: order-service-secrets
type: Opaque
data:
  DATABASE_URL: <base64-encoded>
  STRIPE_API_KEY: <base64-encoded>

---
# deployment.yaml
spec:
  containers:
    - envFrom:
        - secretRef:
            name: order-service-secrets

Secrets injected as environment variables are accessible to the application as DATABASE_URL and STRIPE_API_KEY. Reference in application.yaml:

spring:
  datasource:
    url: ${DATABASE_URL}

Mounted secret files for large secrets or certificates:

# deployment.yaml
spec:
  volumes:
    - name: tls-certs
      secret:
        secretName: order-service-tls
  containers:
    - volumeMounts:
        - name: tls-certs
          mountPath: /etc/ssl/certs/app
          readOnly: true

Spring Boot can reference the mounted file path in configuration:

server:
  ssl:
    key-store: /etc/ssl/certs/app/keystore.p12
    key-store-password: ${TLS_KEYSTORE_PASSWORD}

The complete Kubernetes deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0      # never reduce capacity during rollout
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: order-service
          image: order-service:1.4.2
          ports:
            - containerPort: 8080   # application
            - containerPort: 8081   # management
          env:
            - name: SPRING_PROFILES_ACTIVE
              value: production
            - name: JAVA_OPTS
              value: "-XX:MaxRAMPercentage=75.0"
          envFrom:
            - configMapRef:
                name: order-service-config
            - secretRef:
                name: order-service-secrets
          resources:
            requests:
              memory: 512Mi
              cpu: 250m
            limits:
              memory: 1Gi
              cpu: 1000m
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "15"]
          startupProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8081
            failureThreshold: 30
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8081
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: 8081
            periodSeconds: 5
            failureThreshold: 3

maxUnavailable: 0 with maxSurge: 1 means rolling updates add one new pod before removing any old ones — capacity never decreases during rollout. Combined with graceful shutdown and the preStop hook, requests during pod replacement are handled without errors.

resources.requests tells Kubernetes how much to reserve when scheduling the pod. resources.limits caps what the pod can use. Setting requests equal to limits (Guaranteed QoS class) prevents resource starvation during node pressure — Kubernetes won't evict a Guaranteed pod until it exceeds its limits.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Hidden Expenses Every Remote Contractor Must Consider

Remote contracting sounds simple: work from anywhere, get paid, repeat. But behind the freedom is a list of costs most people don’t see coming.

Read more

The Real Cost of Hiring a Backend Developer in Barcelona Once You Add Employer Contributions

The salary on the offer letter is only part of what a Barcelona backend hire actually costs. Most founders find out the rest after they've already committed.

Read more

Hiring Backend Developers in New York Takes 11 Weeks. Here Is What Smart Founders Do Instead

You posted the role eight weeks ago. You've done six technical screens. Your top candidate just accepted an offer somewhere else.

Read more

JPA Query Optimization — What Hibernate Generates and How to Control It

Hibernate generates SQL from your entity model and query methods. The generated SQL is often correct but rarely optimal. Understanding what gets generated — and the specific patterns that override it — determines whether JPA is a productivity tool or a performance liability.

Read more