Spring Cloud Vault in Production — Configuration, Failover, and the Secrets You Shouldn't Store
by Eric Hanson, Backend Developer at Clean Systems Consulting
The production configuration that matters
Spring Cloud Vault's defaults are reasonable for development but need tuning for production. The full production-appropriate configuration:
spring:
cloud:
vault:
uri: https://vault.internal:8200
scheme: https
host: vault.internal
port: 8443
# Connection resilience
connection-timeout: 5000 # 5s to establish connection
read-timeout: 15000 # 15s for a response
# Authentication
authentication: kubernetes
kubernetes:
role: ${SERVICE_NAME}
kubernetes-path: auth/kubernetes
service-account-token-file: /var/run/secrets/kubernetes.io/serviceaccount/token
# Lease lifecycle — the most important production setting
config:
lifecycle:
enabled: true
min-renewal: 10s # renew when ≤10s from expiry
expiry-threshold: 1m # start renewal process 1m before expiry
lease-endpoints: legacy
# KV secret loading
kv:
enabled: true
backend: secret
default-context: ${SERVICE_NAME}/${ENVIRONMENT}
profile-separator: /
# Database dynamic credentials
database:
enabled: true
role: ${SERVICE_NAME}-role
backend: database
# Startup behavior
fail-fast: true # fail startup if Vault unreachable
config:
import: "vault:" # load secrets from Vault into Environment
The lease lifecycle settings. expiry-threshold: 1m means Spring Cloud Vault starts the renewal process when the lease has 1 minute remaining. min-renewal: 10s means it won't attempt renewal if the remaining TTL is already below 10 seconds — at that point, the credential is effectively expired and a new one should be requested. These values create a renewal window between 10 seconds and 1 minute before expiry.
For a 1-hour credential TTL:
- Credential issued at T+0, expires at T+3600
- Renewal window starts at T+3540 (1 minute before expiry)
- If renewal fails, retry until T+3590 (10 seconds before expiry)
- If all retries fail, the credential expires and new connection attempts fail
The connection timeouts. A 5-second connection timeout and 15-second read timeout balance responsiveness with allowing Vault time to respond under load. On startup, if Vault doesn't respond within these windows, Spring Cloud Vault retries based on retry configuration.
Startup failure modes and how to handle them
Spring Cloud Vault interacts with Vault at two distinct points: startup (load secrets) and runtime (renew leases). Each has different failure handling.
Startup failure with fail-fast: true (the default):
Application context load
→ Spring Cloud Vault connects to Vault
→ Authenticates (Kubernetes service account token)
→ Fetches secrets and populates Environment
→ If any step fails: BeanCreationException → context load fails → pod CrashLoopBackOff
This is the correct behavior for required secrets. An application that starts without its database password or API keys is either broken or a security risk.
For services with optional Vault secrets, override per-path:
spring:
config:
import:
- "vault:" # required — fail if unavailable
- "optional:vault:secret/feature-flags" # optional — proceed without
Startup retry configuration:
spring:
cloud:
vault:
config:
lifecycle:
enabled: true
retry:
max-attempts: 6
initial-interval: 1000 # 1s initial backoff
max-interval: 10000 # 10s max backoff
multiplier: 1.5 # exponential backoff
Six attempts with exponential backoff starting at 1 second means the application waits up to approximately 30 seconds for Vault to become available. In Kubernetes, this covers scenarios where Vault is briefly unavailable during pod startup due to network initialization.
Runtime failure — lease renewal:
When Vault is unavailable during lease renewal, Spring Cloud Vault logs errors and retries. Existing credentials continue to work for active database connections — PostgreSQL doesn't invalidate connections when the credential's renewal fails. The application degrades when new connections are needed (pool growth, connection replacement after error).
The Vault HA cluster handles most runtime availability concerns — a 3-node cluster remains available through single-node failures and rolling upgrades without application interruption.
Vault high availability in Kubernetes
A single Vault node is a single point of failure. For production, deploy at minimum 3 Vault nodes with Raft integrated storage:
# vault-values.yaml for the HashiCorp Vault Helm chart
global:
enabled: true
tlsDisable: false
server:
replicas: 3
ha:
enabled: true
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = 0
address = "[::]:8200"
tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"
tls_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
}
storage "raft" {
path = "/vault/data"
}
service_registration "kubernetes" {}
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
dataStorage:
enabled: true
size: 10Gi
storageClass: fast-ssd # SSD-backed storage for Raft WAL
affinity: |
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: vault
topologyKey: kubernetes.io/hostname # one pod per node
podAntiAffinity with topologyKey: kubernetes.io/hostname prevents multiple Vault nodes from running on the same Kubernetes node — a single node failure doesn't take down the quorum.
Auto-unseal. Vault seals itself on restart — an operator must provide unseal keys. In production, this is automated using a cloud KMS:
# AWS KMS auto-unseal
seal "awskms" {
region = "us-east-1"
kms_key_id = "arn:aws:kms:us-east-1:123456789:key/abc-def-ghi"
}
With auto-unseal, restarted Vault nodes unseal automatically using the KMS key — no manual intervention required during rolling restarts or node failures.
Secret path organization
How secrets are organized in Vault's KV store determines access control granularity and operational clarity. A path structure that works:
secret/
{service-name}/
{environment}/
application # service-specific secrets
shared/
{environment}/
application # shared secrets (internal CA cert, shared API keys)
infrastructure/
databases/ # database credentials (usually dynamic, not KV)
tls/ # TLS certificates
With Spring Cloud Vault's default context configuration:
spring:
cloud:
vault:
kv:
default-context: order-service/production
application-name: order-service
Spring Cloud Vault loads from:
secret/order-service/production(service + environment specific)secret/order-service(service-specific, all environments)secret/application(shared across all services)
Later paths override earlier ones — environment-specific secrets override service-level defaults, which override shared defaults. This mirrors how Spring profiles work.
Vault policies per service:
# order-service-policy.hcl — minimal required access
path "secret/data/order-service/production/*" {
capabilities = ["read"]
}
path "secret/data/shared/production/*" {
capabilities = ["read"]
}
path "database/creds/order-service-role" {
capabilities = ["read"]
}
path "sys/leases/renew" {
capabilities = ["update"]
}
path "sys/leases/revoke" {
capabilities = ["update"]
}
Each service gets its own policy with minimal required access. No service can read another service's secrets. Shared secrets are in a dedicated path that all services can read.
Secrets that belong in Vault and secrets that don't
Well-suited for Vault:
- Database credentials — dynamic secrets are Vault's highest-value feature. Per-instance credentials with automatic expiry.
- Third-party API keys — Stripe, SendGrid, Twilio, etc. Auditable access, centralized rotation.
- Internal service credentials — credentials for calling internal services that require authentication.
- TLS certificates — Vault's PKI secrets engine issues certificates with automatic expiry and renewal.
- Encryption keys — Vault transit engine manages keys without exposing them to applications.
Less suited for Vault:
- Feature flags — frequent reads, low security value. Use a dedicated feature flag service (LaunchDarkly, Unleash) or a config map.
- Application configuration — log levels, pool sizes, timeouts. These are configuration, not secrets. Environment variables or Kubernetes ConfigMaps are appropriate.
- Public data — API base URLs, public keys for signature verification. No reason to protect these in a secrets manager.
- Development credentials — local development with Vault adds setup overhead without meaningful security benefit. Use Docker Compose with plain environment variables for development.
The test: does unauthorized access to this value cause a security or compliance incident? If yes, it's a secret and belongs in Vault. If no, it's configuration and belongs in environment variables or ConfigMaps.
Audit logging — the value you don't see
Vault logs every secret access to an audit backend. Enable it immediately in production:
vault audit enable file file_path=/vault/logs/audit.log
Or to syslog for aggregation into your log platform:
vault audit enable syslog tag="vault" facility="AUTH"
Each audit log entry records:
{
"time": "2026-04-17T14:30:00Z",
"type": "response",
"auth": {
"client_token": "hmac-sha256:...",
"accessor": "hmac-sha256:...",
"display_name": "kubernetes-order-service",
"policies": ["order-service-policy"],
"entity_id": "abc-123"
},
"request": {
"id": "xyz-789",
"operation": "read",
"path": "database/creds/order-service-role"
},
"response": {
"data": {
"username": "hmac-sha256:...", // sensitive fields are hashed
"password": "hmac-sha256:..."
}
}
}
Sensitive values are HMAC-hashed in the audit log — the log records that a credential was accessed, not what the credential was. This satisfies audit requirements without creating a second secret exposure vector.
Audit logging is so important that Vault won't write new secrets if the audit backend is unavailable. If the audit log disk fills up or the syslog destination is unreachable, Vault stops responding to requests — an intentional safety mechanism. Size the audit log storage appropriately and alert on disk utilization.
The token renewal gap
Spring Cloud Vault's lease renewal renews secret leases (database credentials, PKI certificates). The Vault token itself — the credential used to authenticate with Vault — also expires and must be renewed separately.
With Kubernetes auth, the Vault token TTL is set in the Kubernetes auth role:
vault write auth/kubernetes/role/order-service \
bound_service_account_names=order-service \
bound_service_account_namespaces=production \
policies=order-service-policy \
ttl=1h \ # Vault token TTL
max_ttl=24h # Maximum TTL even with renewal
Spring Cloud Vault renews the Vault token automatically as part of the lifecycle management. If token renewal fails (Vault unavailable, token TTL exceeded max_ttl), the application must re-authenticate — which means re-reading the Kubernetes service account token and making a new auth request.
Spring Cloud Vault handles re-authentication automatically when the token expires — the lifecycle manager detects the expired token and re-runs the auth flow. This is transparent to the application as long as Vault is available when re-authentication is needed.
The max_ttl trap. If max_ttl is set to 24 hours and the application runs continuously, the token expires at 24 hours regardless of renewal attempts. After 24 hours, re-authentication is required. Ensure the auth flow works reliably — test it in staging by running the application for longer than max_ttl and verifying it re-authenticates successfully rather than entering a broken state.
Monitoring Vault integration health
Spring Boot Actuator exposes a Vault health indicator when Spring Cloud Vault is on the classpath:
GET /actuator/health
{
"components": {
"vault": {
"status": "UP",
"details": {
"version": "1.15.2",
"sealed": false,
"initialized": true
}
}
}
}
Include Vault in the readiness health group — if Vault is unreachable, the pod should stop receiving traffic until connectivity is restored:
management:
endpoint:
health:
group:
readiness:
include: db, vault, redis
Alert on:
- Vault health status transitioning from UP to DOWN
- Lease renewal failures in application logs (
VaultException: Status 503) - Token renewal failures (
PermissionDeniederrors indicate a policy or TTL issue) - Audit log disk utilization above 80%
The combination of health probes, lease renewal monitoring, and audit log alerting provides the operational visibility needed to run Vault reliably in production.
The runbook every team needs
Before going to production with Vault:
□ Vault cluster is 3+ nodes with Raft consensus
□ Auto-unseal is configured (AWS KMS, GCP Cloud KMS, or Azure Key Vault)
□ Audit logging is enabled and monitored
□ Audit log storage is sized and alerted on
□ Each service has its own policy with minimal required access
□ Kubernetes auth roles bound to specific namespaces
□ Spring Cloud Vault lifecycle management is enabled and tested
□ Re-authentication after max_ttl tested in staging
□ Vault unavailability at startup tested (fail-fast behavior verified)
□ Runbook exists for Vault seal/unseal in emergency scenarios
□ Backup strategy exists for Vault data (Raft snapshots)
□ Vault upgrade path tested (rolling upgrade without service disruption)
Vault in production is a significant operational commitment. The security benefits are real — dynamic credentials, audit trails, centralized rotation. The operational costs are equally real — a cluster to maintain, a backup strategy to implement, and failure modes that don't exist with environment variables. Make the decision with both sides of the ledger in view.