How to Write Rails Migrations Without Causing Downtime
by Eric Hanson, Backend Developer at Clean Systems Consulting
Why migrations that work locally fail in production
In development, schema changes run against an empty or small database. Locks are acquired and released in milliseconds. In production, ALTER TABLE on a table with 10 million rows and 200 concurrent connections acquiring an ACCESS EXCLUSIVE lock means those 200 connections queue behind it. If the migration takes 30 seconds, every request touching that table waits 30 seconds. The queue depth grows faster than it drains. The database melts.
PostgreSQL's locking behavior is the underlying model you need. Most DDL operations — ADD COLUMN, DROP COLUMN, ADD CONSTRAINT, CREATE INDEX — acquire an ACCESS EXCLUSIVE lock that blocks reads and writes for the duration. A few operations are safe without that lock. The rest require workarounds.
The safe operations — no lock required
A small set of DDL operations in PostgreSQL are safe on live tables:
ADD COLUMNwith no default value and noNOT NULLconstraint — PostgreSQL 11+ handles this without a table rewriteADD COLUMNwith a non-volatile default andNOT NULL— PostgreSQL 11+ also handles this safely (it stores the default in the catalog, not in every row)DROP COLUMN— marks the column invisible in the catalog; no rewriteCREATE INDEX CONCURRENTLY— builds the index without holding a lock on the tableDROP INDEX CONCURRENTLY— sameALTER TABLE ... VALIDATE CONSTRAINT(separate from adding the constraint) — validates without locking
Before PostgreSQL 11, ADD COLUMN with a default value triggered a full table rewrite because PostgreSQL had to write the default into every existing row. If you're on PostgreSQL 10 or earlier, every column addition with a default is a table rewrite and a lock.
Adding a column safely
Adding a nullable column with no default is safe in all PostgreSQL versions:
def change
add_column :orders, :notes, :text
end
Adding a column with a default: safe in PostgreSQL 11+ for immutable defaults (integers, strings, booleans). Unsafe for volatile defaults (NOW(), gen_random_uuid()):
# Safe on PostgreSQL 11+
add_column :orders, :priority, :integer, default: 0, null: false
# Unsafe — volatile default triggers rewrite on all versions
add_column :orders, :token, :uuid, default: -> { "gen_random_uuid()" }, null: false
For the volatile default case or for pre-PostgreSQL 11 databases, use the three-step pattern:
# Migration 1: add nullable with no default
add_column :orders, :token, :uuid
# Background job or migration 2: backfill existing rows
Order.in_batches(of: 1000).update_all("token = gen_random_uuid()")
# Migration 3: add not-null constraint
add_not_null_constraint :orders, :token # strong_migrations gem helper
# or: change_column_null :orders, :token, false (validates inline — risky on large tables)
The backfill must happen between migrations, in batches, so it doesn't lock the table while updating rows. in_batches with a reasonable batch size and a sleep between batches (in_batches(of: 1000).each_with_index { |batch, i| batch.update_all(...); sleep(0.1) if i % 10 == 0 }) avoids saturating the database during peak traffic.
Adding a NOT NULL constraint safely
change_column_null :table, :column, false acquires an ACCESS EXCLUSIVE lock and scans the entire table to verify no nulls exist. On a large table, this takes long enough to matter.
The safe alternative is a two-step approach using PostgreSQL's NOT VALID constraint option:
# Step 1: Add constraint but don't validate existing rows — fast, minimal lock
execute "ALTER TABLE orders ADD CONSTRAINT orders_token_not_null CHECK (token IS NOT NULL) NOT VALID"
# Step 2: Validate existing rows — acquires SHARE UPDATE EXCLUSIVE, doesn't block reads/writes
execute "ALTER TABLE orders VALIDATE CONSTRAINT orders_token_not_null"
NOT VALID marks the constraint as applying only to new rows. Existing rows are not scanned. The lock acquired is minimal and brief. VALIDATE CONSTRAINT then verifies existing rows with a lock that allows reads and writes to proceed concurrently — it only blocks other DDL operations.
The strong_migrations gem encapsulates this pattern in add_not_null_constraint / validate_not_null_constraint helpers and will refuse to run unsafe migrations in production, surfacing the safe alternative in the error message.
Adding a foreign key safely
add_foreign_key validates all existing rows by default, holding a lock. The safe pattern mirrors the NOT NULL approach:
# Add without validation — minimal lock, instant
add_foreign_key :orders, :users, validate: false
# Validate separately — SHARE UPDATE EXCLUSIVE lock, non-blocking
validate_foreign_key :orders, :users
These two steps can be in separate migrations deployed separately. The constraint is enforced for new rows immediately after the first step; validation of existing rows happens without blocking.
Creating indexes without locking
add_index runs CREATE INDEX, which locks the table for writes for the duration. On a large table, this means no inserts or updates while the index builds — potentially minutes.
CREATE INDEX CONCURRENTLY builds the index in the background with no write locks. The index is available after the build completes:
def change
# add_index uses CREATE INDEX — locks on write
add_index :orders, :user_id
# add_index with algorithm: :concurrently — no write lock
add_index :orders, :status, algorithm: :concurrently
end
Two constraints on CONCURRENTLY:
It cannot run inside a transaction. Rails wraps migrations in transactions by default. You must disable the transaction wrapper for any migration that uses CONCURRENTLY:
class AddStatusIndexToOrders < ActiveRecord::Migration[7.1]
disable_ddl_transaction!
def change
add_index :orders, :status, algorithm: :concurrently
end
end
It takes longer to build than a regular index (it does two passes over the table data). It also leaves behind an INVALID index entry if the build fails — you must DROP INDEX CONCURRENTLY the invalid index before retrying.
Renaming columns — the safe multi-step process
rename_column in a single migration deploys a schema change that breaks any running instance of the application still using the old column name. With zero-downtime deploys and rolling restarts, you'll have old and new code running simultaneously.
The safe rename requires four deployments:
Deploy 1: Add new column, write to both old and new column simultaneously
Deploy 2: Backfill old column data into new column for rows written before Deploy 1
Deploy 3: Read from new column, continue writing to both
Deploy 4: Stop writing to old column, drop it
In practice, column renames are so expensive to do safely that it's worth questioning whether the rename is necessary. Additive changes (new column, old column deprecated but not removed) are safe. Renames are not.
If you must rename, the store_model gem and virtual attribute approaches can alias the old name to the new column at the model level, eliminating the need for a schema rename entirely.
Removing columns — tell ActiveRecord first
ActiveRecord caches column information. If you drop a column in a migration while the old application code is still running (common in rolling deploys), ActiveRecord tries to read a column that no longer exists and raises:
ActiveModel::MissingAttributeError: missing attribute: old_column
The fix: ignore the column in the model before dropping it in the schema. This must be deployed before the migration:
# Deploy 1: Tell ActiveRecord to ignore the column
class Order < ApplicationRecord
self.ignored_columns = [:old_column]
end
# Deploy 2: Drop the column in a migration
def change
remove_column :orders, :old_column
end
# Deploy 3: Remove the ignored_columns declaration
Three deployments to safely drop a column. It's tedious but necessary for zero-downtime operation.
The strong_migrations gem
strong_migrations is worth adding to any production Rails application. It intercepts unsafe migrations before they run and raises an error with a description of the problem and the safe alternative:
ActiveRecord::Migration: Dangerous operation detected!
Adding a column with a non-null default will cause the entire table to be rewritten.
Instead, add the column without the default, deploy, then backfill the column.
It enforces safe patterns automatically — CREATE INDEX CONCURRENTLY, NOT VALID constraints, batched backfills — and blocks the dangerous ones. Configurable with an allow-list for cases where you've verified safety through another means:
# config/initializers/strong_migrations.rb
StrongMigrations.start_after = 20240101000000 # only enforce for new migrations
StrongMigrations.target_postgresql_version = "14"
This is the lowest-friction way to ensure your team doesn't accidentally write a locking migration under deadline pressure.
The deployment order that makes it all work
Safe migrations require thinking about deployments in three phases:
Before the migration: Any code changes that need the new schema — reading a new column, using a new index — cannot deploy until the migration runs.
The migration: Schema changes that are backward compatible with the old code. Adding columns, adding indexes. Not removing columns, not renaming columns the old code reads.
After the migration: Code that stops reading old columns, cleanup migrations that remove scaffolding from the safe rename/remove process.
Every migration should be answerable to: if the old version of the app is running while this migration executes, does anything break? If yes, the migration is not safe to deploy without downtime.