Spring Cloud Vault in Production — Configuration, Failover, and the Secrets You Shouldn't Store

by Eric Hanson, Backend Developer at Clean Systems Consulting

The production configuration that matters

Spring Cloud Vault's defaults are reasonable for development but need tuning for production. The full production-appropriate configuration:

spring:
  cloud:
    vault:
      uri: https://vault.internal:8200
      scheme: https
      host: vault.internal
      port: 8443

      # Connection resilience
      connection-timeout: 5000    # 5s to establish connection
      read-timeout: 15000         # 15s for a response

      # Authentication
      authentication: kubernetes
      kubernetes:
        role: ${SERVICE_NAME}
        kubernetes-path: auth/kubernetes
        service-account-token-file: /var/run/secrets/kubernetes.io/serviceaccount/token

      # Lease lifecycle — the most important production setting
      config:
        lifecycle:
          enabled: true
          min-renewal: 10s        # renew when ≤10s from expiry
          expiry-threshold: 1m    # start renewal process 1m before expiry
          lease-endpoints: legacy

      # KV secret loading
      kv:
        enabled: true
        backend: secret
        default-context: ${SERVICE_NAME}/${ENVIRONMENT}
        profile-separator: /

      # Database dynamic credentials
      database:
        enabled: true
        role: ${SERVICE_NAME}-role
        backend: database

      # Startup behavior
      fail-fast: true             # fail startup if Vault unreachable

  config:
    import: "vault:"              # load secrets from Vault into Environment

The lease lifecycle settings. expiry-threshold: 1m means Spring Cloud Vault starts the renewal process when the lease has 1 minute remaining. min-renewal: 10s means it won't attempt renewal if the remaining TTL is already below 10 seconds — at that point, the credential is effectively expired and a new one should be requested. These values create a renewal window between 10 seconds and 1 minute before expiry.

For a 1-hour credential TTL:

  • Credential issued at T+0, expires at T+3600
  • Renewal window starts at T+3540 (1 minute before expiry)
  • If renewal fails, retry until T+3590 (10 seconds before expiry)
  • If all retries fail, the credential expires and new connection attempts fail

The connection timeouts. A 5-second connection timeout and 15-second read timeout balance responsiveness with allowing Vault time to respond under load. On startup, if Vault doesn't respond within these windows, Spring Cloud Vault retries based on retry configuration.

Startup failure modes and how to handle them

Spring Cloud Vault interacts with Vault at two distinct points: startup (load secrets) and runtime (renew leases). Each has different failure handling.

Startup failure with fail-fast: true (the default):

Application context load
  → Spring Cloud Vault connects to Vault
  → Authenticates (Kubernetes service account token)
  → Fetches secrets and populates Environment
  → If any step fails: BeanCreationException → context load fails → pod CrashLoopBackOff

This is the correct behavior for required secrets. An application that starts without its database password or API keys is either broken or a security risk.

For services with optional Vault secrets, override per-path:

spring:
  config:
    import:
      - "vault:"                           # required — fail if unavailable
      - "optional:vault:secret/feature-flags"  # optional — proceed without

Startup retry configuration:

spring:
  cloud:
    vault:
      config:
        lifecycle:
          enabled: true
  retry:
    max-attempts: 6
    initial-interval: 1000    # 1s initial backoff
    max-interval: 10000       # 10s max backoff
    multiplier: 1.5           # exponential backoff

Six attempts with exponential backoff starting at 1 second means the application waits up to approximately 30 seconds for Vault to become available. In Kubernetes, this covers scenarios where Vault is briefly unavailable during pod startup due to network initialization.

Runtime failure — lease renewal:

When Vault is unavailable during lease renewal, Spring Cloud Vault logs errors and retries. Existing credentials continue to work for active database connections — PostgreSQL doesn't invalidate connections when the credential's renewal fails. The application degrades when new connections are needed (pool growth, connection replacement after error).

The Vault HA cluster handles most runtime availability concerns — a 3-node cluster remains available through single-node failures and rolling upgrades without application interruption.

Vault high availability in Kubernetes

A single Vault node is a single point of failure. For production, deploy at minimum 3 Vault nodes with Raft integrated storage:

# vault-values.yaml for the HashiCorp Vault Helm chart
global:
  enabled: true
  tlsDisable: false

server:
  replicas: 3
  ha:
    enabled: true
    raft:
      enabled: true
      setNodeId: true
      config: |
        ui = true
        listener "tcp" {
          tls_disable = 0
          address = "[::]:8200"
          tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
          tls_key_file  = "/vault/userconfig/vault-server-tls/vault.key"
          tls_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
        }
        storage "raft" {
          path = "/vault/data"
        }
        service_registration "kubernetes" {}

  resources:
    requests:
      memory: 256Mi
      cpu: 250m
    limits:
      memory: 256Mi
      cpu: 250m

  dataStorage:
    enabled: true
    size: 10Gi
    storageClass: fast-ssd   # SSD-backed storage for Raft WAL

  affinity: |
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app.kubernetes.io/name: vault
          topologyKey: kubernetes.io/hostname  # one pod per node

podAntiAffinity with topologyKey: kubernetes.io/hostname prevents multiple Vault nodes from running on the same Kubernetes node — a single node failure doesn't take down the quorum.

Auto-unseal. Vault seals itself on restart — an operator must provide unseal keys. In production, this is automated using a cloud KMS:

# AWS KMS auto-unseal
seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "arn:aws:kms:us-east-1:123456789:key/abc-def-ghi"
}

With auto-unseal, restarted Vault nodes unseal automatically using the KMS key — no manual intervention required during rolling restarts or node failures.

Secret path organization

How secrets are organized in Vault's KV store determines access control granularity and operational clarity. A path structure that works:

secret/
  {service-name}/
    {environment}/
      application          # service-specific secrets
  shared/
    {environment}/
      application          # shared secrets (internal CA cert, shared API keys)
  infrastructure/
    databases/             # database credentials (usually dynamic, not KV)
    tls/                   # TLS certificates

With Spring Cloud Vault's default context configuration:

spring:
  cloud:
    vault:
      kv:
        default-context: order-service/production
        application-name: order-service

Spring Cloud Vault loads from:

  1. secret/order-service/production (service + environment specific)
  2. secret/order-service (service-specific, all environments)
  3. secret/application (shared across all services)

Later paths override earlier ones — environment-specific secrets override service-level defaults, which override shared defaults. This mirrors how Spring profiles work.

Vault policies per service:

# order-service-policy.hcl — minimal required access
path "secret/data/order-service/production/*" {
  capabilities = ["read"]
}

path "secret/data/shared/production/*" {
  capabilities = ["read"]
}

path "database/creds/order-service-role" {
  capabilities = ["read"]
}

path "sys/leases/renew" {
  capabilities = ["update"]
}

path "sys/leases/revoke" {
  capabilities = ["update"]
}

Each service gets its own policy with minimal required access. No service can read another service's secrets. Shared secrets are in a dedicated path that all services can read.

Secrets that belong in Vault and secrets that don't

Well-suited for Vault:

  • Database credentials — dynamic secrets are Vault's highest-value feature. Per-instance credentials with automatic expiry.
  • Third-party API keys — Stripe, SendGrid, Twilio, etc. Auditable access, centralized rotation.
  • Internal service credentials — credentials for calling internal services that require authentication.
  • TLS certificates — Vault's PKI secrets engine issues certificates with automatic expiry and renewal.
  • Encryption keys — Vault transit engine manages keys without exposing them to applications.

Less suited for Vault:

  • Feature flags — frequent reads, low security value. Use a dedicated feature flag service (LaunchDarkly, Unleash) or a config map.
  • Application configuration — log levels, pool sizes, timeouts. These are configuration, not secrets. Environment variables or Kubernetes ConfigMaps are appropriate.
  • Public data — API base URLs, public keys for signature verification. No reason to protect these in a secrets manager.
  • Development credentials — local development with Vault adds setup overhead without meaningful security benefit. Use Docker Compose with plain environment variables for development.

The test: does unauthorized access to this value cause a security or compliance incident? If yes, it's a secret and belongs in Vault. If no, it's configuration and belongs in environment variables or ConfigMaps.

Audit logging — the value you don't see

Vault logs every secret access to an audit backend. Enable it immediately in production:

vault audit enable file file_path=/vault/logs/audit.log

Or to syslog for aggregation into your log platform:

vault audit enable syslog tag="vault" facility="AUTH"

Each audit log entry records:

{
  "time": "2026-04-17T14:30:00Z",
  "type": "response",
  "auth": {
    "client_token": "hmac-sha256:...",
    "accessor": "hmac-sha256:...",
    "display_name": "kubernetes-order-service",
    "policies": ["order-service-policy"],
    "entity_id": "abc-123"
  },
  "request": {
    "id": "xyz-789",
    "operation": "read",
    "path": "database/creds/order-service-role"
  },
  "response": {
    "data": {
      "username": "hmac-sha256:...",  // sensitive fields are hashed
      "password": "hmac-sha256:..."
    }
  }
}

Sensitive values are HMAC-hashed in the audit log — the log records that a credential was accessed, not what the credential was. This satisfies audit requirements without creating a second secret exposure vector.

Audit logging is so important that Vault won't write new secrets if the audit backend is unavailable. If the audit log disk fills up or the syslog destination is unreachable, Vault stops responding to requests — an intentional safety mechanism. Size the audit log storage appropriately and alert on disk utilization.

The token renewal gap

Spring Cloud Vault's lease renewal renews secret leases (database credentials, PKI certificates). The Vault token itself — the credential used to authenticate with Vault — also expires and must be renewed separately.

With Kubernetes auth, the Vault token TTL is set in the Kubernetes auth role:

vault write auth/kubernetes/role/order-service \
  bound_service_account_names=order-service \
  bound_service_account_namespaces=production \
  policies=order-service-policy \
  ttl=1h \           # Vault token TTL
  max_ttl=24h        # Maximum TTL even with renewal

Spring Cloud Vault renews the Vault token automatically as part of the lifecycle management. If token renewal fails (Vault unavailable, token TTL exceeded max_ttl), the application must re-authenticate — which means re-reading the Kubernetes service account token and making a new auth request.

Spring Cloud Vault handles re-authentication automatically when the token expires — the lifecycle manager detects the expired token and re-runs the auth flow. This is transparent to the application as long as Vault is available when re-authentication is needed.

The max_ttl trap. If max_ttl is set to 24 hours and the application runs continuously, the token expires at 24 hours regardless of renewal attempts. After 24 hours, re-authentication is required. Ensure the auth flow works reliably — test it in staging by running the application for longer than max_ttl and verifying it re-authenticates successfully rather than entering a broken state.

Monitoring Vault integration health

Spring Boot Actuator exposes a Vault health indicator when Spring Cloud Vault is on the classpath:

GET /actuator/health
{
  "components": {
    "vault": {
      "status": "UP",
      "details": {
        "version": "1.15.2",
        "sealed": false,
        "initialized": true
      }
    }
  }
}

Include Vault in the readiness health group — if Vault is unreachable, the pod should stop receiving traffic until connectivity is restored:

management:
  endpoint:
    health:
      group:
        readiness:
          include: db, vault, redis

Alert on:

  • Vault health status transitioning from UP to DOWN
  • Lease renewal failures in application logs (VaultException: Status 503)
  • Token renewal failures (PermissionDenied errors indicate a policy or TTL issue)
  • Audit log disk utilization above 80%

The combination of health probes, lease renewal monitoring, and audit log alerting provides the operational visibility needed to run Vault reliably in production.

The runbook every team needs

Before going to production with Vault:

□ Vault cluster is 3+ nodes with Raft consensus
□ Auto-unseal is configured (AWS KMS, GCP Cloud KMS, or Azure Key Vault)
□ Audit logging is enabled and monitored
□ Audit log storage is sized and alerted on
□ Each service has its own policy with minimal required access
□ Kubernetes auth roles bound to specific namespaces
□ Spring Cloud Vault lifecycle management is enabled and tested
□ Re-authentication after max_ttl tested in staging
□ Vault unavailability at startup tested (fail-fast behavior verified)
□ Runbook exists for Vault seal/unseal in emergency scenarios
□ Backup strategy exists for Vault data (Raft snapshots)
□ Vault upgrade path tested (rolling upgrade without service disruption)

Vault in production is a significant operational commitment. The security benefits are real — dynamic credentials, audit trails, centralized rotation. The operational costs are equally real — a cluster to maintain, a backup strategy to implement, and failure modes that don't exist with environment variables. Make the decision with both sides of the ledger in view.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

What to Do When a Software Project Fails

It shipped late. Or worse — it never shipped at all. Every team hits this moment. What matters is what you do next.

Read more

Your Local Backend Talent Pool Is Not Going to Get Bigger — Here Is What to Do About It

Waiting for the local backend hiring market to improve is a plan. It's just not a plan that ships features.

Read more

Building a Webhook System in Spring Boot — Delivery, Retries, and Signature Verification

Webhooks are HTTP callbacks from your system to your customers' systems. Delivering them reliably — with retries, signature verification, and delivery tracking — requires more infrastructure than a simple HTTP call. Here is the complete pattern.

Read more

RuboCop in Practice — Rules I Enable, Disable, and Why

RuboCop ships with hundreds of cops enabled by default, many of which create noise without improving code quality. Here is a configuration philosophy and the specific rules worth fighting over.

Read more