Configuring Spring Boot for Docker and Kubernetes — Health Probes, Graceful Shutdown, and Resource Limits
by Eric Hanson, Backend Developer at Clean Systems Consulting
The gap between "runs in Docker" and "works in Kubernetes"
Running a Spring Boot application in Docker is straightforward. Running it correctly in Kubernetes requires specific configuration for each of these scenarios:
- Pod startup: Kubernetes must know when the application is ready to receive traffic
- Pod shutdown: Kubernetes must allow in-flight requests to complete before termination
- Pod replacement: new pods must become ready before old ones are removed
- Resource pressure: the JVM must stay within container memory limits
- Configuration: secrets and config maps must override application properties
Each scenario has a specific Spring Boot or JVM configuration that determines correct behavior.
Dockerfile — the container image
A production-appropriate Spring Boot Dockerfile:
# Build stage
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /app
COPY .mvn/ .mvn/
COPY mvnw pom.xml ./
RUN ./mvnw dependency:go-offline -q
COPY src/ src/
RUN ./mvnw package -DskipTests -q
# Runtime stage
FROM eclipse-temurin:21-jre-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
# Copy layered JAR contents for better Docker layer caching
COPY --from=builder /app/target/dependency/ ./dependency/
COPY --from=builder /app/target/snapshot-dependencies/ ./snapshot-dependencies/
COPY --from=builder /app/target/spring-boot-loader/ ./spring-boot-loader/
COPY --from=builder /app/target/application/ ./application/
USER appuser
EXPOSE 8080 8081
ENTRYPOINT ["java", \
"-XX:+UseContainerSupport", \
"-XX:MaxRAMPercentage=75.0", \
"-Djava.security.egd=file:/dev/./urandom", \
"org.springframework.boot.loader.launch.JarLauncher"]
Key decisions:
Multi-stage build. The build stage uses the JDK; the runtime stage uses the JRE — significantly smaller final image. Builder dependencies (Maven, test libraries) don't end up in the production image.
Non-root user. adduser appuser runs the application as a non-root user — required by most Kubernetes security policies and a security best practice.
Layered JARs. Spring Boot's layered JAR format separates the JAR into layers ordered by how frequently they change: dependencies (infrequently), snapshot dependencies (occasionally), loader (rarely), application code (frequently). Docker caches each layer. A code change only rebuilds the application layer — dependency layers are cached from the previous build. Build time drops significantly for large applications.
Enable layered JARs in pom.xml:
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<layers>
<enabled>true</enabled>
</layers>
</configuration>
</plugin>
Extract layers during build: ./mvnw spring-boot:build-image or java -Djarmode=layertools -jar app.jar extract.
-XX:+UseContainerSupport and -XX:MaxRAMPercentage=75.0. Without container support, the JVM reads the host's total RAM for heap sizing — a pod in a container with 1GB memory limit on a 64GB node sizes the heap based on 64GB. UseContainerSupport (default in Java 11+) makes the JVM read from cgroup memory limits. MaxRAMPercentage=75.0 sets the heap to 75% of the container's memory limit — leaving 25% for Metaspace, thread stacks, native memory, and OS buffers.
/dev/./urandom prevents the JVM from blocking on /dev/random for entropy during startup — common in containers where hardware entropy is limited.
Health probes — startup, liveness, and readiness
Spring Boot 2.3+ supports three distinct health probe endpoints:
management:
endpoint:
health:
probes:
enabled: true
group:
liveness:
include: livenessState
readiness:
include: readinessState, db, redis
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
This creates three endpoints:
/actuator/health/liveness— is the application running?/actuator/health/readiness— is the application ready for traffic?/actuator/health— overall health (for monitoring)
Startup probe prevents liveness failures during slow startup:
# Kubernetes deployment.yaml
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
failureThreshold: 30 # 30 attempts × 10s = 5 minutes for startup
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
initialDelaySeconds: 0 # startup probe handles initial delay
periodSeconds: 10
failureThreshold: 3 # restart after 3 consecutive failures
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8081
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 3 # stop routing traffic after 3 consecutive failures
The three-probe combination handles slow startup (startup probe), unhealthy running application (liveness probe), and temporarily unavailable dependencies (readiness probe) as separate concerns. Without the startup probe, a slow-starting application fails the liveness probe and is restarted in a loop — the classic Spring Boot on Kubernetes failure mode.
Liveness vs readiness — the critical distinction:
Liveness failure → Kubernetes restarts the pod. Use only for unrecoverable states: the application is deadlocked, the JVM has run out of memory, the application is in a permanently broken state. A database being unavailable should never fail liveness — the pod doesn't need a restart, it needs the database to recover.
Readiness failure → Kubernetes stops routing traffic to the pod. Use for any dependency that's temporarily unavailable: database connection pool exhausted, Redis unreachable, an upstream service timing out. When the dependency recovers, the readiness probe returns to success and traffic resumes.
Programmatic control of readiness state:
@Component
public class MaintenanceController {
private final ApplicationContext applicationContext;
@PostMapping("/admin/maintenance/start")
public void startMaintenance() {
// Mark the pod as not ready — Kubernetes stops routing traffic
AvailabilityChangeEvent.publish(applicationContext,
ReadinessState.REFUSING_TRAFFIC);
}
@PostMapping("/admin/maintenance/end")
public void endMaintenance() {
AvailabilityChangeEvent.publish(applicationContext,
ReadinessState.ACCEPTING_TRAFFIC);
}
}
This enables zero-traffic maintenance windows without pod restarts.
Graceful shutdown
Without graceful shutdown, Kubernetes terminates a pod and all in-flight requests fail with a connection reset. Spring Boot 2.3+ provides built-in graceful shutdown:
server:
shutdown: graceful
spring:
lifecycle:
timeout-per-shutdown-phase: 30s
When Kubernetes sends SIGTERM (the termination signal), Spring Boot:
- Stops accepting new requests
- Waits for in-flight requests to complete (up to
timeout-per-shutdown-phase) - Executes
@PreDestroyandDisposableBean.destroy()callbacks - Shuts down thread pools, closes database connections
- Exits
The Kubernetes termination grace period must be longer than the Spring Boot shutdown timeout:
# deployment.yaml
spec:
template:
spec:
terminationGracePeriodSeconds: 60 # Kubernetes waits up to 60s for pod to exit
# application.yaml
spring:
lifecycle:
timeout-per-shutdown-phase: 30s # Spring Boot completes in-flight requests within 30s
The sequence: Kubernetes sends SIGTERM → Spring Boot stops accepting requests and starts draining → Spring Boot exits within 30s → Kubernetes confirms exit. If Spring Boot exceeds terminationGracePeriodSeconds, Kubernetes sends SIGKILL — the application is killed immediately with no cleanup.
The preStop hook gap. Kubernetes simultaneously sends SIGTERM and removes the pod from the service endpoints — but the removal propagates asynchronously through kube-proxy and load balancers. For a brief window (typically 5–15 seconds), requests may still arrive at a pod that has stopped accepting connections.
The fix: a preStop hook that pauses before the pod begins shutting down:
# deployment.yaml
spec:
template:
spec:
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["sleep", "15"] # wait for endpoint removal to propagate
The preStop hook runs before SIGTERM is sent. The pod waits 15 seconds, then receives SIGTERM and begins graceful shutdown. This 15-second buffer allows load balancer endpoint removal to propagate before the pod stops accepting connections.
Adjust terminationGracePeriodSeconds to account for the preStop duration:
terminationGracePeriodSeconds: 60 # 15s preStop + 30s shutdown + 15s buffer
JVM resource configuration for containers
Memory. MaxRAMPercentage=75.0 is the starting point. Verify the actual memory breakdown:
# Inside the container
java -XX:+PrintFlagsFinal -version 2>&1 | grep -E "MaxHeapSize|InitialHeapSize"
Monitor container memory usage vs the limit in production. If the container regularly approaches the limit, either increase the limit or reduce MaxRAMPercentage. OOM kills (container memory limit exceeded) appear in kubectl describe pod as OOMKilled exit code 137.
CPU. The JVM calibrates thread pool sizes and JIT compilation threads based on available CPUs. In containers with fractional CPU limits (resources.limits.cpu: 0.5), the JVM sees 1 CPU from Runtime.getRuntime().availableProcessors() in Java 11+ with container support. For very low CPU limits (< 1 CPU), explicit thread configuration may be needed:
-Djdk.virtualThreadScheduler.parallelism=2 # for virtual threads
-XX:ActiveProcessorCount=2 # explicit override
Virtual threads (Java 21+) in Kubernetes. Virtual threads with Spring Boot require no additional configuration. Enable as covered in the virtual threads article:
spring.threads.virtual.enabled=true
The JVM scheduler maps virtual threads to the available carrier threads. In containers with limited CPU, carrier thread count adjusts automatically.
Externalized configuration
Kubernetes provides configuration through ConfigMaps and Secrets. Spring Boot's externalized configuration picks these up via environment variables and mounted files:
Environment variables from ConfigMap:
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: order-service-config
data:
SPRING_DATASOURCE_HIKARI_MAXIMUM_POOL_SIZE: "20"
SPRING_JPA_HIBERNATE_DDL_AUTO: "validate"
MANAGEMENT_SERVER_PORT: "8081"
---
# deployment.yaml
spec:
containers:
- env:
- name: SPRING_PROFILES_ACTIVE
value: production
envFrom:
- configMapRef:
name: order-service-config
Spring Boot converts environment variables to properties using relaxed binding: SPRING_DATASOURCE_HIKARI_MAXIMUM_POOL_SIZE maps to spring.datasource.hikari.maximum-pool-size.
Secrets for credentials:
# secret.yaml (created via kubectl create secret or sealed-secrets)
apiVersion: v1
kind: Secret
metadata:
name: order-service-secrets
type: Opaque
data:
DATABASE_URL: <base64-encoded>
STRIPE_API_KEY: <base64-encoded>
---
# deployment.yaml
spec:
containers:
- envFrom:
- secretRef:
name: order-service-secrets
Secrets injected as environment variables are accessible to the application as DATABASE_URL and STRIPE_API_KEY. Reference in application.yaml:
spring:
datasource:
url: ${DATABASE_URL}
Mounted secret files for large secrets or certificates:
# deployment.yaml
spec:
volumes:
- name: tls-certs
secret:
secretName: order-service-tls
containers:
- volumeMounts:
- name: tls-certs
mountPath: /etc/ssl/certs/app
readOnly: true
Spring Boot can reference the mounted file path in configuration:
server:
ssl:
key-store: /etc/ssl/certs/app/keystore.p12
key-store-password: ${TLS_KEYSTORE_PASSWORD}
The complete Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # never reduce capacity during rollout
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
terminationGracePeriodSeconds: 60
containers:
- name: order-service
image: order-service:1.4.2
ports:
- containerPort: 8080 # application
- containerPort: 8081 # management
env:
- name: SPRING_PROFILES_ACTIVE
value: production
- name: JAVA_OPTS
value: "-XX:MaxRAMPercentage=75.0"
envFrom:
- configMapRef:
name: order-service-config
- secretRef:
name: order-service-secrets
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 1000m
lifecycle:
preStop:
exec:
command: ["sleep", "15"]
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8081
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8081
periodSeconds: 5
failureThreshold: 3
maxUnavailable: 0 with maxSurge: 1 means rolling updates add one new pod before removing any old ones — capacity never decreases during rollout. Combined with graceful shutdown and the preStop hook, requests during pod replacement are handled without errors.
resources.requests tells Kubernetes how much to reserve when scheduling the pod. resources.limits caps what the pod can use. Setting requests equal to limits (Guaranteed QoS class) prevents resource starvation during node pressure — Kubernetes won't evict a Guaranteed pod until it exceeds its limits.