What an Index Actually Does to Your Database
by Eric Hanson, Backend Developer at Clean Systems Consulting
The abstraction you've been taking for granted
You add an index, the query gets faster, you move on. But when a query is slow despite having an index, or when you're trying to decide whether to add one, or when you're diagnosing why writes are slowing down — at that point, treating indexes as a black box stops working. You need the mental model.
An index is a separate data structure maintained alongside your table, updated on every insert, update, and delete, organized to make specific lookup patterns fast. It has a cost (storage, write overhead) and a benefit (faster reads). Understanding the structure tells you when that tradeoff pays off.
B-tree: the default index type
Most indexes you create are B-tree indexes (Balanced Tree). The structure:
- A tree of nodes, each containing sorted key values and pointers
- The leaf nodes form a doubly linked list sorted by indexed value
- Lookup traverses from root to leaf: O(log n) comparisons
- Range scans follow the linked leaf nodes: O(log n + k) where k is result size
For a table with 10 million rows, the tree has approximately log₁₆(10,000,000) ≈ 6 levels. Finding any single row requires at most 6 page reads. At 100 million rows, it's ~7 levels. The logarithmic scaling is why indexes stay fast as tables grow.
-- Standard B-tree index
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Supports these operations (uses the index):
WHERE created_at = '2024-01-15'
WHERE created_at > '2024-01-01'
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY created_at ASC/DESC
-- Does NOT support (falls back to seq scan):
WHERE YEAR(created_at) = 2024 -- function applied to column
WHERE created_at::TEXT LIKE '2024%' -- type cast
The key constraint: the index only helps when you access the column in its stored form, without wrapping functions or implicit casts.
How the heap fetch works
An index lookup has two parts:
- Index scan: traverse the B-tree to find matching leaf entries (fast, few page reads)
- Heap fetch: follow the row pointer from each leaf entry to read the actual row from the table storage (potentially expensive if rows are scattered across many pages)
Index (sorted by email): Heap (unordered rows):
┌─────────────────────────────┐ ┌──────────────────────────┐
│ 'adam@ex.com' → Page 47, Row 3 ──────→ │ Page 47: [row3: Adam...] │
│ 'beth@ex.com' → Page 12, Row 7 ──────→ │ Page 12: [row7: Beth...] │
│ 'carl@ex.com' → Page 89, Row 1 ──────→ │ Page 89: [row1: Carl...] │
└─────────────────────────────┘ └──────────────────────────┘
If your range scan matches 1,000 rows scattered across 1,000 different heap pages, you do 1,000 random page reads — which may be worse than a sequential scan of the whole table. This is why the optimizer sometimes ignores your index: it calculates that a sequential scan is cheaper for high-selectivity cases.
Index-only scans: the fast path
If your query only needs columns that are in the index, the database can skip the heap fetch entirely:
-- Standard index on (user_id)
-- This query needs user_id (in index) AND status (not in index) → heap fetch required
SELECT status FROM orders WHERE user_id = 42;
-- Covering index on (user_id, status)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Now this query is an index-only scan — no heap fetch
SELECT status FROM orders WHERE user_id = 42;
In PostgreSQL, index-only scans also require the visibility map to indicate that heap pages don't need to be checked for row visibility (MVCC). Run VACUUM regularly to keep the visibility map current on heavily updated tables.
Write overhead: the cost side
Every index you add slows down writes:
- INSERT: add a new entry to each index B-tree
- UPDATE: if an indexed column changes, delete the old entry and insert a new one
- DELETE: mark the index entry as dead (it's cleaned up by vacuum/autovacuum)
On a table with 5 indexes, an INSERT writes to 6 places (the table + 5 indexes). On high-write tables, index overhead is a real cost. PostgreSQL's EXPLAIN (ANALYZE, BUFFERS) shows buffer writes, which lets you see the write amplification.
Index types beyond B-tree
Hash index: O(1) equality lookup, no range support. Useful for pure equality on high-cardinality columns. In PostgreSQL, hash indexes are now WAL-logged (since PG 10) and safe for production.
GIN (Generalized Inverted Index): for indexing multi-valued types — arrays, JSONB, tsvector. Used for full-text search and JSONB @> / ? operators.
GiST (Generalized Search Tree): for geometric types, range types, and full-text search. Also used by pg_trgm for LIKE/ILIKE pattern matching.
BRIN (Block Range INdex): stores min/max values per block range. Tiny size, useful only for columns that correlate strongly with physical storage order (e.g., auto-increment IDs, monotonically increasing timestamps). Excellent for time-series append tables.
-- BRIN on a time-series table where rows are inserted in timestamp order
CREATE INDEX idx_events_created_brin ON events USING BRIN (created_at);
-- This index is ~100x smaller than a B-tree index on the same column
-- Only useful for range queries on naturally ordered data
The index mental model applied
When a query is slow, your first question should be: is the predicate in my WHERE clause covered by an index? Check with EXPLAIN. If it's doing a Seq Scan, check whether a usable index exists and whether the optimizer's selectivity estimate is accurate. If the index exists but isn't used, investigate whether a function or cast is defeating it, or whether the optimizer correctly determined the seq scan is cheaper.