Adding Too Many Indexes Is Also a Problem
by Eric Hanson, Backend Developer at Clean Systems Consulting
The schema that has too many indexes
You take over a legacy codebase. The orders table has 22 indexes. Every column that ever appeared in a WHERE clause, in any query, by any developer, has been indexed. Some indexes overlap. Some haven't been used in months. Every INSERT into orders writes to 22 index B-trees. Your write throughput is constrained not by disk speed or transaction overhead but by index maintenance.
Over-indexing is real, it's common, and it's harder to diagnose than missing indexes because the cost is distributed across every write operation rather than manifesting as a single slow query.
The cost of each index
Every index you create:
Slows every INSERT: each index requires a B-tree insertion. At 22 indexes, an INSERT touches 23 data structures. This is write amplification.
Slows UPDATE on indexed columns: an UPDATE to an indexed column requires deleting the old index entry and inserting a new one. In PostgreSQL's MVCC model, every UPDATE creates a new row version — even updates to non-indexed columns may cause index updates (HOT updates, which avoid this, are only possible when no indexed columns change and there's space on the same heap page).
Consumes storage: a B-tree index on a large table can be comparable in size to the table itself. 22 indexes can mean your orders table's index storage exceeds the table storage by 3-5x.
Bloats backups and replication lag: physical backups include index data. Logical replication must apply index updates on the replica.
Confuses the optimizer: with many overlapping indexes, the optimizer spends more time evaluating plan alternatives. On extremely complex queries, this planning overhead is measurable.
What over-indexing looks like
-- A real example of index sprawl on a single table
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_user_id_status ON orders(user_id, status); -- Makes the first redundant
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at); -- Partial overlap with idx_orders_user_id_status
CREATE INDEX idx_orders_status_created ON orders(status, created_at); -- When was this ever used?
CREATE INDEX idx_orders_total ON orders(total); -- Added for one report query
The index on user_id alone is made redundant by (user_id, status) for any query that filters on user_id — the composite index serves as a superset. The standalone user_id index wastes space and write overhead without adding query coverage.
Finding unused indexes
PostgreSQL tracks index usage statistics. An index with zero scans since the server last restarted is a candidate for removal:
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelid NOT IN (
-- Exclude unique indexes (used for constraint enforcement, not just scans)
SELECT indexrelid FROM pg_index WHERE indisunique
)
ORDER BY pg_relation_size(indexrelid) DESC;
Be cautious: statistics reset on server restart, so idx_scan = 0 only means "unused since last restart." Run this analysis over a period that covers your full workload cycle — including month-end reports, batch jobs, and seasonal traffic patterns. An index used once per month for a critical report should not be dropped.
Finding redundant indexes
An index is redundant when another index provides a superset of its query coverage:
-- Index A: (user_id)
-- Index B: (user_id, status)
-- Any query that could use A can also use B — A is redundant
-- PostgreSQL: find indexes where one is a prefix of another
SELECT
a.indexname AS redundant_index,
b.indexname AS covering_index,
a.tablename
FROM pg_indexes a
JOIN pg_indexes b ON a.tablename = b.tablename
AND a.indexname != b.indexname
AND b.indexdef LIKE '%' || split_part(a.indexdef, '(', 2);
-- (Simplified — use pg_index system catalog for production-grade analysis)
Tools like pgIndexAdvisor (PostgreSQL) or Percona's pt-duplicate-key-checker (MySQL) automate redundant index detection.
The write throughput tradeoff in numbers
On a table receiving 10,000 inserts/second with 10 indexes vs 20 indexes, the difference in index write amplification is roughly 10,000 additional B-tree insertions per second. On modern NVMe storage with good buffer pool management this may be imperceptible. On a heavily loaded server with many concurrent writers, it compounds. The point is not that every extra index costs you measurably — it's that you should know the cost and make the decision deliberately.
The right standard for adding indexes
Before adding an index, answer:
- What specific query does this index accelerate?
- How frequently does that query run?
- Does an existing index already cover this query (check the leftmost prefix rule)?
- What is the write volume on this table, and can it absorb another index?
If you can't answer question 1 with a specific query, don't add the index. Add it when you have a slow query that needs it, not preemptively for columns that "might be filtered on someday."
The audit cadence
Run the unused index query against your production database once per quarter. Drop indexes that have been unused for a full workload cycle. Monitor write latency before and after to validate the improvement. Index hygiene is maintenance work — schedule it like you schedule vacuuming and statistics updates.