The Pipeline Step Nobody Wants to Optimize Until It Hurts

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Step Everyone Treats as Solved

Your pipeline runs tests, builds an image, maybe scans it for vulnerabilities. Then, somewhere in the deployment process, there's a migration step. It runs flyway migrate or liquibase update against the target database, and if it works, the app starts. If it doesn't, you're debugging at 2pm on a Thursday while traffic is routing to a service that can't start.

Migration handling is the step most teams don't examine because it "just works" — until it doesn't. When it fails, the failure modes are severe: a blocking lock on a large table, a migration that ran partially before timing out, a schema change that's incompatible with the currently deployed application version, or a migration that worked in staging (MySQL 8.0.28 with a 30-second lock timeout) but times out in production (MySQL 8.0.32 with a different default).

Why Migrations Are Structurally Different From Other Pipeline Steps

Most pipeline steps are idempotent and isolated. Running unit tests twice doesn't change anything. Building the Docker image twice produces the same artifact. Migrations are neither: they mutate shared state (the database schema), they're often not safely re-runnable, and their effects are visible immediately to any currently-running instance of the application.

This means migration failures don't just fail the deployment — they can leave the database in a state where neither the old nor new version of the application can run correctly. That is a production incident, not a failed pipeline run.

What Actually Goes Wrong

Non-backward-compatible changes. Adding a NOT NULL column without a default causes existing application instances (if you're doing a rolling deploy) to fail when inserting rows that don't include the new column. The migration succeeds; the currently-running pods start failing.

-- Dangerous: breaks running instances during rolling deploy
ALTER TABLE payments ADD COLUMN reference_id VARCHAR(64) NOT NULL;

-- Safe: add nullable first, backfill, then add constraint in next release
ALTER TABLE payments ADD COLUMN reference_id VARCHAR(64) NULL;

Lock acquisition on large tables. A simple ALTER TABLE on a table with 50 million rows will attempt to acquire an exclusive lock. In MySQL (InnoDB), this can wait indefinitely behind existing transactions, blocking all writes to the table. In PostgreSQL, ALTER TABLE ADD COLUMN DEFAULT has been lock-free since version 11 for simple types — but ALTER TABLE ADD COLUMN NOT NULL still locks.

Timeouts that leave partial state. If your migration runner has a 30-second timeout and a migration takes 35 seconds in production, you get a partial migration (depending on whether the statement was atomic) or a failed migration that Flyway marks as broken — preventing future migrations from running.

Migration Testing in the Pipeline

The place to catch these issues is CI, not production. But most pipelines run migrations against a fresh empty database on every run, which means they never catch migration problems that only appear at scale or against real data distributions.

A more useful approach:

# Integration test job: run migrations against a snapshot of production schema
integration-tests:
  services:
    postgres:
      image: postgres:16
      env:
        POSTGRES_DB: testdb

  steps:
    - uses: actions/checkout@v4

    # Restore a schema dump (not data) from production
    - name: Restore baseline schema
      run: |
        psql $DATABASE_URL < ./db/baseline-schema.sql

    # Run pending migrations against the baseline
    - name: Run migrations
      run: ./gradlew flywayMigrate

    # Then run application tests
    - name: Run integration tests
      run: ./gradlew integrationTest

The baseline schema should be a recent dump of the production schema structure (not data), updated monthly or when significant schema changes land. This gives you migration tests that run against a realistic starting point rather than an empty database.

Separating Migration from Deployment

The most resilient pattern is running migrations separately from application deployment, with an explicit validation step between:

  1. Pre-deployment migration: run the migration before deploying new application code
  2. Validation: confirm the database is in the expected state
  3. Application deployment: deploy new code that is compatible with both old and new schema
  4. Post-deployment cleanup (next release): remove backward-compatibility shims

This requires that every migration be backward-compatible with the current application version. It's more design work per migration, but it eliminates the class of incidents where a bad migration causes the entire fleet of application instances to fail simultaneously.

#!/bin/bash
# deploy.sh: migrations first, deploy second, validate between

echo "Running database migrations..."
flyway -url="$DB_URL" -user="$DB_USER" -password="$DB_PASSWORD" migrate

if [ $? -ne 0 ]; then
  echo "Migration failed. Aborting deployment."
  exit 1
fi

echo "Validating schema..."
flyway -url="$DB_URL" validate

echo "Deploying application..."
kubectl set image deployment/myapp myapp="$IMAGE_TAG"
kubectl rollout status deployment/myapp --timeout=5m

The step nobody wants to optimize is worth 20% of your incident risk. Treat it with the care it deserves.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Vancouver Has World-Class Backend Engineers — Big Tech Hired Them at Rates Startups Cannot Match

Vancouver's engineering talent is genuinely exceptional. The companies that recognized this first built compensation structures around retaining it.

Read more

Why Your Unit Tests Are Slow and What to Do About It

A unit test suite that takes 10 minutes to run will stop being run. Slow tests accumulate through specific, fixable causes — hidden I/O, over-reliance on containers, and tests that are integration tests wearing unit test clothing.

Read more

Why Every Engineering Team Needs a Tech Lead

At first, skipping a tech lead feels like saving money. Then decisions pile up, and nobody knows who should make them.

Read more

JPA Query Optimization — What Hibernate Generates and How to Control It

Hibernate generates SQL from your entity model and query methods. The generated SQL is often correct but rarely optimal. Understanding what gets generated — and the specific patterns that override it — determines whether JPA is a productivity tool or a performance liability.

Read more