The Pipeline Step Nobody Wants to Optimize Until It Hurts

February 23, 2026

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Step Everyone Treats as Solved

Your pipeline runs tests, builds an image, maybe scans it for vulnerabilities. Then, somewhere in the deployment process, there's a migration step. It runs flyway migrate or liquibase update against the target database, and if it works, the app starts. If it doesn't, you're debugging at 2pm on a Thursday while traffic is routing to a service that can't start.

Migration handling is the step most teams don't examine because it "just works" — until it doesn't. When it fails, the failure modes are severe: a blocking lock on a large table, a migration that ran partially before timing out, a schema change that's incompatible with the currently deployed application version, or a migration that worked in staging (MySQL 8.0.28 with a 30-second lock timeout) but times out in production (MySQL 8.0.32 with a different default).

Why Migrations Are Structurally Different From Other Pipeline Steps

Most pipeline steps are idempotent and isolated. Running unit tests twice doesn't change anything. Building the Docker image twice produces the same artifact. Migrations are neither: they mutate shared state (the database schema), they're often not safely re-runnable, and their effects are visible immediately to any currently-running instance of the application.

This means migration failures don't just fail the deployment — they can leave the database in a state where neither the old nor new version of the application can run correctly. That is a production incident, not a failed pipeline run.

What Actually Goes Wrong

Non-backward-compatible changes. Adding a NOT NULL column without a default causes existing application instances (if you're doing a rolling deploy) to fail when inserting rows that don't include the new column. The migration succeeds; the currently-running pods start failing.

-- Dangerous: breaks running instances during rolling deploy
ALTER TABLE payments ADD COLUMN reference_id VARCHAR(64) NOT NULL;

-- Safe: add nullable first, backfill, then add constraint in next release
ALTER TABLE payments ADD COLUMN reference_id VARCHAR(64) NULL;

Lock acquisition on large tables. A simple ALTER TABLE on a table with 50 million rows will attempt to acquire an exclusive lock. In MySQL (InnoDB), this can wait indefinitely behind existing transactions, blocking all writes to the table. In PostgreSQL, ALTER TABLE ADD COLUMN DEFAULT has been lock-free since version 11 for simple types — but ALTER TABLE ADD COLUMN NOT NULL still locks.

Timeouts that leave partial state. If your migration runner has a 30-second timeout and a migration takes 35 seconds in production, you get a partial migration (depending on whether the statement was atomic) or a failed migration that Flyway marks as broken — preventing future migrations from running.

Migration Testing in the Pipeline

The place to catch these issues is CI, not production. But most pipelines run migrations against a fresh empty database on every run, which means they never catch migration problems that only appear at scale or against real data distributions.

A more useful approach:

# Integration test job: run migrations against a snapshot of production schema
integration-tests:
  services:
    postgres:
      image: postgres:16
      env:
        POSTGRES_DB: testdb

  steps:
    - uses: actions/checkout@v4

    # Restore a schema dump (not data) from production
    - name: Restore baseline schema
      run: |
        psql $DATABASE_URL < ./db/baseline-schema.sql

    # Run pending migrations against the baseline
    - name: Run migrations
      run: ./gradlew flywayMigrate

    # Then run application tests
    - name: Run integration tests
      run: ./gradlew integrationTest

The baseline schema should be a recent dump of the production schema structure (not data), updated monthly or when significant schema changes land. This gives you migration tests that run against a realistic starting point rather than an empty database.

Separating Migration from Deployment

The most resilient pattern is running migrations separately from application deployment, with an explicit validation step between:

Pre-deployment migration: run the migration before deploying new application code
Validation: confirm the database is in the expected state
Application deployment: deploy new code that is compatible with both old and new schema
Post-deployment cleanup (next release): remove backward-compatibility shims

This requires that every migration be backward-compatible with the current application version. It's more design work per migration, but it eliminates the class of incidents where a bad migration causes the entire fleet of application instances to fail simultaneously.

#!/bin/bash
# deploy.sh: migrations first, deploy second, validate between

echo "Running database migrations..."
flyway -url="$DB_URL" -user="$DB_USER" -password="$DB_PASSWORD" migrate

if [ $? -ne 0 ]; then
  echo "Migration failed. Aborting deployment."
  exit 1
fi

echo "Validating schema..."
flyway -url="$DB_URL" validate

echo "Deploying application..."
kubectl set image deployment/myapp myapp="$IMAGE_TAG"
kubectl rollout status deployment/myapp --timeout=5m

The step nobody wants to optimize is worth 20% of your incident risk. Treat it with the care it deserves.

Our offices

Follow us

The Pipeline Step Nobody Wants to Optimize Until It Hurts

The Step Everyone Treats as Solved

Why Migrations Are Structurally Different From Other Pipeline Steps

What Actually Goes Wrong

Migration Testing in the Pipeline

Separating Migration from Deployment

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Vancouver Has World-Class Backend Engineers — Big Tech Hired Them at Rates Startups Cannot Match

Why Your Unit Tests Are Slow and What to Do About It

Why Every Engineering Team Needs a Tech Lead

JPA Query Optimization — What Hibernate Generates and How to Control It