You're Missing Indexes Where It Matters Most
by Eric Hanson, Backend Developer at Clean Systems Consulting
The index you forgot to add
Your application has been in production for eight months. The orders table has 12 million rows. A customer support dashboard query that runs every time an agent opens a ticket is doing a full table scan because no one indexed customer_id. The query takes 4 seconds. There are 30 support agents. You have a problem that has existed since day one and was entirely preventable.
Missing indexes on foreign keys and frequently filtered columns is one of the most predictable and avoidable performance problems in web applications. Predictable because the access patterns are known before deployment. Avoidable because adding the index at migration time costs nothing.
The indexes that almost always need to exist
Every foreign key column
PostgreSQL does not automatically index foreign key columns. MySQL's InnoDB does. If you're on PostgreSQL, every FK column that will be used in joins or lookups needs an explicit index:
CREATE TABLE orders (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
user_id BIGINT NOT NULL REFERENCES users(id),
status VARCHAR(50) NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- These need to be added explicitly in PostgreSQL
CREATE INDEX idx_orders_user_id ON orders(user_id);
Without idx_orders_user_id, the query SELECT * FROM orders WHERE user_id = 42 scans every row. More critically, DELETE FROM users WHERE id = 42 will scan the entire orders table to check for referential integrity violations before the delete executes.
Columns in WHERE clauses on large tables
Any column that appears in a WHERE clause on a table expected to grow should have an index evaluated at schema design time, not after the first slow-query report. Common misses:
-- Frequently filtered, rarely indexed by default
WHERE status = 'pending'
WHERE email = 'user@example.com'
WHERE created_at > NOW() - INTERVAL '7 days'
WHERE is_deleted = false
The caveat: low-cardinality columns (status with 3 values, booleans) have poor selectivity — the index is only useful if the filtered value is rare. An index on status = 'pending' is worthless if 90% of rows are pending.
Composite indexes: column order is not arbitrary
A composite index on (a, b) supports queries filtering on a alone or on (a, b) together. It does not support queries filtering on b alone. This is the "leftmost prefix" rule.
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Uses the index (leftmost prefix)
WHERE user_id = 42
WHERE user_id = 42 AND status = 'completed'
-- Does NOT use the index
WHERE status = 'completed'
Design composite indexes based on your actual query patterns:
-- If you frequently query:
-- "all pending orders for a user, sorted by date"
CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at DESC);
-- This supports:
WHERE user_id = 42 AND status = 'pending' ORDER BY created_at DESC
The column order should put the most selective equality filter first, then range filters, then sort columns. A common mistake is putting a low-cardinality column first (status, then user_id) — this produces a less selective first level and forces the database to scan more index entries.
Covering indexes: eliminating heap fetches
If your query only needs a few columns, a covering index (one that includes all needed columns) allows an index-only scan — no table access required:
-- Query: get order IDs and totals for a user, sorted by date
SELECT id, total, created_at FROM orders WHERE user_id = 42 ORDER BY created_at DESC;
-- Standard index: needs heap fetch for 'total' and 'created_at'
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Covering index: all needed columns in the index
CREATE INDEX idx_orders_user_covering ON orders(user_id, created_at DESC) INCLUDE (id, total);
The INCLUDE clause (PostgreSQL 11+, SQL Server) adds non-key columns to the leaf level of the index without affecting the sort order. MySQL uses a different syntax — you include the columns directly in the index definition.
Partial indexes: indexes over a subset of rows
If you only query a specific subset of rows, a partial index covers only those rows — smaller, faster, and more selective:
-- If 95% of orders are 'completed' and you only query 'pending'
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- The index only contains pending orders — much smaller and highly selective
Partial indexes are underused. They're particularly valuable for:
- Soft-deleted records (
WHERE deleted_at IS NULL) - Active subscriptions (
WHERE status = 'active') - Unprocessed queue items (
WHERE processed = false)
Finding missing indexes in production
PostgreSQL tracks sequential scans and index usage in pg_stat_user_tables and pg_stat_user_indexes:
-- Tables with high sequential scan counts — candidates for new indexes
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
n_live_tup
FROM pg_stat_user_tables
WHERE seq_scan > 1000
AND n_live_tup > 10000
ORDER BY seq_tup_read DESC;
A table with high seq_scan and large n_live_tup is being fully scanned repeatedly. Look at the queries hitting that table (via pg_stat_statements if enabled) and find the filter columns without indexes.
The standard you should hold yourself to
Before any table migration goes to production, write out the five most common query patterns against that table. Verify that each has a usable index. This takes ten minutes at migration time. It prevents the firefighting that happens six months later when the table has grown past the point where adding an index without disruption requires careful planning around CREATE INDEX CONCURRENTLY.