Adding Too Many Indexes Is Also a Problem

by Eric Hanson, Backend Developer at Clean Systems Consulting

The schema that has too many indexes

You take over a legacy codebase. The orders table has 22 indexes. Every column that ever appeared in a WHERE clause, in any query, by any developer, has been indexed. Some indexes overlap. Some haven't been used in months. Every INSERT into orders writes to 22 index B-trees. Your write throughput is constrained not by disk speed or transaction overhead but by index maintenance.

Over-indexing is real, it's common, and it's harder to diagnose than missing indexes because the cost is distributed across every write operation rather than manifesting as a single slow query.

The cost of each index

Every index you create:

Slows every INSERT: each index requires a B-tree insertion. At 22 indexes, an INSERT touches 23 data structures. This is write amplification.

Slows UPDATE on indexed columns: an UPDATE to an indexed column requires deleting the old index entry and inserting a new one. In PostgreSQL's MVCC model, every UPDATE creates a new row version — even updates to non-indexed columns may cause index updates (HOT updates, which avoid this, are only possible when no indexed columns change and there's space on the same heap page).

Consumes storage: a B-tree index on a large table can be comparable in size to the table itself. 22 indexes can mean your orders table's index storage exceeds the table storage by 3-5x.

Bloats backups and replication lag: physical backups include index data. Logical replication must apply index updates on the replica.

Confuses the optimizer: with many overlapping indexes, the optimizer spends more time evaluating plan alternatives. On extremely complex queries, this planning overhead is measurable.

What over-indexing looks like

-- A real example of index sprawl on a single table
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_user_id_status ON orders(user_id, status);  -- Makes the first redundant
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);  -- Partial overlap with idx_orders_user_id_status
CREATE INDEX idx_orders_status_created ON orders(status, created_at); -- When was this ever used?
CREATE INDEX idx_orders_total ON orders(total);                        -- Added for one report query

The index on user_id alone is made redundant by (user_id, status) for any query that filters on user_id — the composite index serves as a superset. The standalone user_id index wastes space and write overhead without adding query coverage.

Finding unused indexes

PostgreSQL tracks index usage statistics. An index with zero scans since the server last restarted is a candidate for removal:

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelid NOT IN (
    -- Exclude unique indexes (used for constraint enforcement, not just scans)
    SELECT indexrelid FROM pg_index WHERE indisunique
  )
ORDER BY pg_relation_size(indexrelid) DESC;

Be cautious: statistics reset on server restart, so idx_scan = 0 only means "unused since last restart." Run this analysis over a period that covers your full workload cycle — including month-end reports, batch jobs, and seasonal traffic patterns. An index used once per month for a critical report should not be dropped.

Finding redundant indexes

An index is redundant when another index provides a superset of its query coverage:

-- Index A: (user_id)
-- Index B: (user_id, status)
-- Any query that could use A can also use B — A is redundant

-- PostgreSQL: find indexes where one is a prefix of another
SELECT
  a.indexname AS redundant_index,
  b.indexname AS covering_index,
  a.tablename
FROM pg_indexes a
JOIN pg_indexes b ON a.tablename = b.tablename
  AND a.indexname != b.indexname
  AND b.indexdef LIKE '%' || split_part(a.indexdef, '(', 2);
-- (Simplified — use pg_index system catalog for production-grade analysis)

Tools like pgIndexAdvisor (PostgreSQL) or Percona's pt-duplicate-key-checker (MySQL) automate redundant index detection.

The write throughput tradeoff in numbers

On a table receiving 10,000 inserts/second with 10 indexes vs 20 indexes, the difference in index write amplification is roughly 10,000 additional B-tree insertions per second. On modern NVMe storage with good buffer pool management this may be imperceptible. On a heavily loaded server with many concurrent writers, it compounds. The point is not that every extra index costs you measurably — it's that you should know the cost and make the decision deliberately.

The right standard for adding indexes

Before adding an index, answer:

  1. What specific query does this index accelerate?
  2. How frequently does that query run?
  3. Does an existing index already cover this query (check the leftmost prefix rule)?
  4. What is the write volume on this table, and can it absorb another index?

If you can't answer question 1 with a specific query, don't add the index. Add it when you have a slow query that needs it, not preemptively for columns that "might be filtered on someday."

The audit cadence

Run the unused index query against your production database once per quarter. Drop indexes that have been unused for a full workload cycle. Monitor write latency before and after to validate the improvement. Index hygiene is maintenance work — schedule it like you schedule vacuuming and statistics updates.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

API Versioning and Deprecation in Spring Boot — Managing Breaking Changes Without Breaking Clients

Every API change is either backward compatible or a breaking change. Breaking changes require a new version. The versioning strategy and deprecation process determine whether version upgrades are painful or routine for clients.

Read more

Idempotency: The API Property Most Backend Devs Forget Until It's Too Late

Non-idempotent APIs combined with retry logic are a production incident waiting to happen. Adding idempotency keys is not a nice-to-have for payment APIs — it is a correctness requirement for any operation that should not be executed twice.

Read more

Service Locator vs Dependency Injection in Java — Understanding the Tradeoffs

Both patterns resolve dependencies, but they make opposite choices about who controls the lookup. The difference has concrete consequences for testability, transparency, and how errors surface.

Read more

Docker vs Bare Metal — When Containerizing Is Worth the Overhead

Containers solve deployment reproducibility and density problems that genuinely matter at scale — but they add latency, operational complexity, and abstraction layers that cost real money when you are running a small number of well-understood services.

Read more