How to Write Rails Migrations Without Causing Downtime

by Eric Hanson, Backend Developer at Clean Systems Consulting

Why migrations that work locally fail in production

In development, schema changes run against an empty or small database. Locks are acquired and released in milliseconds. In production, ALTER TABLE on a table with 10 million rows and 200 concurrent connections acquiring an ACCESS EXCLUSIVE lock means those 200 connections queue behind it. If the migration takes 30 seconds, every request touching that table waits 30 seconds. The queue depth grows faster than it drains. The database melts.

PostgreSQL's locking behavior is the underlying model you need. Most DDL operations — ADD COLUMN, DROP COLUMN, ADD CONSTRAINT, CREATE INDEX — acquire an ACCESS EXCLUSIVE lock that blocks reads and writes for the duration. A few operations are safe without that lock. The rest require workarounds.

The safe operations — no lock required

A small set of DDL operations in PostgreSQL are safe on live tables:

  • ADD COLUMN with no default value and no NOT NULL constraint — PostgreSQL 11+ handles this without a table rewrite
  • ADD COLUMN with a non-volatile default and NOT NULL — PostgreSQL 11+ also handles this safely (it stores the default in the catalog, not in every row)
  • DROP COLUMN — marks the column invisible in the catalog; no rewrite
  • CREATE INDEX CONCURRENTLY — builds the index without holding a lock on the table
  • DROP INDEX CONCURRENTLY — same
  • ALTER TABLE ... VALIDATE CONSTRAINT (separate from adding the constraint) — validates without locking

Before PostgreSQL 11, ADD COLUMN with a default value triggered a full table rewrite because PostgreSQL had to write the default into every existing row. If you're on PostgreSQL 10 or earlier, every column addition with a default is a table rewrite and a lock.

Adding a column safely

Adding a nullable column with no default is safe in all PostgreSQL versions:

def change
  add_column :orders, :notes, :text
end

Adding a column with a default: safe in PostgreSQL 11+ for immutable defaults (integers, strings, booleans). Unsafe for volatile defaults (NOW(), gen_random_uuid()):

# Safe on PostgreSQL 11+
add_column :orders, :priority, :integer, default: 0, null: false

# Unsafe — volatile default triggers rewrite on all versions
add_column :orders, :token, :uuid, default: -> { "gen_random_uuid()" }, null: false

For the volatile default case or for pre-PostgreSQL 11 databases, use the three-step pattern:

# Migration 1: add nullable with no default
add_column :orders, :token, :uuid

# Background job or migration 2: backfill existing rows
Order.in_batches(of: 1000).update_all("token = gen_random_uuid()")

# Migration 3: add not-null constraint
add_not_null_constraint :orders, :token  # strong_migrations gem helper
# or: change_column_null :orders, :token, false (validates inline — risky on large tables)

The backfill must happen between migrations, in batches, so it doesn't lock the table while updating rows. in_batches with a reasonable batch size and a sleep between batches (in_batches(of: 1000).each_with_index { |batch, i| batch.update_all(...); sleep(0.1) if i % 10 == 0 }) avoids saturating the database during peak traffic.

Adding a NOT NULL constraint safely

change_column_null :table, :column, false acquires an ACCESS EXCLUSIVE lock and scans the entire table to verify no nulls exist. On a large table, this takes long enough to matter.

The safe alternative is a two-step approach using PostgreSQL's NOT VALID constraint option:

# Step 1: Add constraint but don't validate existing rows — fast, minimal lock
execute "ALTER TABLE orders ADD CONSTRAINT orders_token_not_null CHECK (token IS NOT NULL) NOT VALID"

# Step 2: Validate existing rows — acquires SHARE UPDATE EXCLUSIVE, doesn't block reads/writes
execute "ALTER TABLE orders VALIDATE CONSTRAINT orders_token_not_null"

NOT VALID marks the constraint as applying only to new rows. Existing rows are not scanned. The lock acquired is minimal and brief. VALIDATE CONSTRAINT then verifies existing rows with a lock that allows reads and writes to proceed concurrently — it only blocks other DDL operations.

The strong_migrations gem encapsulates this pattern in add_not_null_constraint / validate_not_null_constraint helpers and will refuse to run unsafe migrations in production, surfacing the safe alternative in the error message.

Adding a foreign key safely

add_foreign_key validates all existing rows by default, holding a lock. The safe pattern mirrors the NOT NULL approach:

# Add without validation — minimal lock, instant
add_foreign_key :orders, :users, validate: false

# Validate separately — SHARE UPDATE EXCLUSIVE lock, non-blocking
validate_foreign_key :orders, :users

These two steps can be in separate migrations deployed separately. The constraint is enforced for new rows immediately after the first step; validation of existing rows happens without blocking.

Creating indexes without locking

add_index runs CREATE INDEX, which locks the table for writes for the duration. On a large table, this means no inserts or updates while the index builds — potentially minutes.

CREATE INDEX CONCURRENTLY builds the index in the background with no write locks. The index is available after the build completes:

def change
  # add_index uses CREATE INDEX — locks on write
  add_index :orders, :user_id

  # add_index with algorithm: :concurrently — no write lock
  add_index :orders, :status, algorithm: :concurrently
end

Two constraints on CONCURRENTLY:

It cannot run inside a transaction. Rails wraps migrations in transactions by default. You must disable the transaction wrapper for any migration that uses CONCURRENTLY:

class AddStatusIndexToOrders < ActiveRecord::Migration[7.1]
  disable_ddl_transaction!

  def change
    add_index :orders, :status, algorithm: :concurrently
  end
end

It takes longer to build than a regular index (it does two passes over the table data). It also leaves behind an INVALID index entry if the build fails — you must DROP INDEX CONCURRENTLY the invalid index before retrying.

Renaming columns — the safe multi-step process

rename_column in a single migration deploys a schema change that breaks any running instance of the application still using the old column name. With zero-downtime deploys and rolling restarts, you'll have old and new code running simultaneously.

The safe rename requires four deployments:

Deploy 1: Add new column, write to both old and new column simultaneously
Deploy 2: Backfill old column data into new column for rows written before Deploy 1
Deploy 3: Read from new column, continue writing to both
Deploy 4: Stop writing to old column, drop it

In practice, column renames are so expensive to do safely that it's worth questioning whether the rename is necessary. Additive changes (new column, old column deprecated but not removed) are safe. Renames are not.

If you must rename, the store_model gem and virtual attribute approaches can alias the old name to the new column at the model level, eliminating the need for a schema rename entirely.

Removing columns — tell ActiveRecord first

ActiveRecord caches column information. If you drop a column in a migration while the old application code is still running (common in rolling deploys), ActiveRecord tries to read a column that no longer exists and raises:

ActiveModel::MissingAttributeError: missing attribute: old_column

The fix: ignore the column in the model before dropping it in the schema. This must be deployed before the migration:

# Deploy 1: Tell ActiveRecord to ignore the column
class Order < ApplicationRecord
  self.ignored_columns = [:old_column]
end

# Deploy 2: Drop the column in a migration
def change
  remove_column :orders, :old_column
end

# Deploy 3: Remove the ignored_columns declaration

Three deployments to safely drop a column. It's tedious but necessary for zero-downtime operation.

The strong_migrations gem

strong_migrations is worth adding to any production Rails application. It intercepts unsafe migrations before they run and raises an error with a description of the problem and the safe alternative:

ActiveRecord::Migration: Dangerous operation detected!
Adding a column with a non-null default will cause the entire table to be rewritten.
Instead, add the column without the default, deploy, then backfill the column.

It enforces safe patterns automatically — CREATE INDEX CONCURRENTLY, NOT VALID constraints, batched backfills — and blocks the dangerous ones. Configurable with an allow-list for cases where you've verified safety through another means:

# config/initializers/strong_migrations.rb
StrongMigrations.start_after = 20240101000000  # only enforce for new migrations
StrongMigrations.target_postgresql_version = "14"

This is the lowest-friction way to ensure your team doesn't accidentally write a locking migration under deadline pressure.

The deployment order that makes it all work

Safe migrations require thinking about deployments in three phases:

Before the migration: Any code changes that need the new schema — reading a new column, using a new index — cannot deploy until the migration runs.

The migration: Schema changes that are backward compatible with the old code. Adding columns, adding indexes. Not removing columns, not renaming columns the old code reads.

After the migration: Code that stops reading old columns, cleanup migrations that remove scaffolding from the safe rename/remove process.

Every migration should be answerable to: if the old version of the app is running while this migration executes, does anything break? If yes, the migration is not safe to deploy without downtime.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Senior Contractors Charge €70-€120 per Hour

Senior contractors don’t charge more for time—they charge for reducing mistakes and accelerating outcomes. What looks expensive upfront is often the cheapest decision in the long run.

Read more

Raleigh Has Great Backend Engineers — Apple, Google and Amazon Get to Them First

The Research Triangle produces serious engineering talent. The biggest companies in the world have known that longer than most Raleigh startups have existed.

Read more

When Big Tech Owns Your Local Talent Pool — How Dublin Startups Hire Backend Engineers

You offered the role to your top candidate on Friday. On Monday she emailed to say Google matched. You never heard from her again.

Read more

Rails Concerns — When They Help and When They Hurt

Rails concerns are one of the most misused features in the framework. Used correctly they share behavior cleanly across unrelated models. Used as a refactoring tool they just relocate complexity without reducing it.

Read more