What an Index Actually Does to Your Database

February 23, 2026

by Arif Ikhsanudin, Backend Developer

The abstraction you've been taking for granted

You add an index, the query gets faster, you move on. But when a query is slow despite having an index, or when you're trying to decide whether to add one, or when you're diagnosing why writes are slowing down — at that point, treating indexes as a black box stops working. You need the mental model.

An index is a separate data structure maintained alongside your table, updated on every insert, update, and delete, organized to make specific lookup patterns fast. It has a cost (storage, write overhead) and a benefit (faster reads). Understanding the structure tells you when that tradeoff pays off.

B-tree: the default index type

Most indexes you create are B-tree indexes (Balanced Tree). The structure:

A tree of nodes, each containing sorted key values and pointers
The leaf nodes form a doubly linked list sorted by indexed value
Lookup traverses from root to leaf: O(log n) comparisons
Range scans follow the linked leaf nodes: O(log n + k) where k is result size

For a table with 10 million rows, the tree has approximately log₁₆(10,000,000) ≈ 6 levels. Finding any single row requires at most 6 page reads. At 100 million rows, it's ~7 levels. The logarithmic scaling is why indexes stay fast as tables grow.

-- Standard B-tree index
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Supports these operations (uses the index):
WHERE created_at = '2024-01-15'
WHERE created_at > '2024-01-01'
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY created_at ASC/DESC

-- Does NOT support (falls back to seq scan):
WHERE YEAR(created_at) = 2024   -- function applied to column
WHERE created_at::TEXT LIKE '2024%'  -- type cast

The key constraint: the index only helps when you access the column in its stored form, without wrapping functions or implicit casts.

How the heap fetch works

An index lookup has two parts:

Index scan: traverse the B-tree to find matching leaf entries (fast, few page reads)
Heap fetch: follow the row pointer from each leaf entry to read the actual row from the table storage (potentially expensive if rows are scattered across many pages)

Index (sorted by email):                   Heap (unordered rows):
┌─────────────────────────────┐            ┌──────────────────────────┐
│ 'adam@ex.com' → Page 47, Row 3 ──────→  │ Page 47: [row3: Adam...] │
│ 'beth@ex.com' → Page 12, Row 7 ──────→  │ Page 12: [row7: Beth...] │
│ 'carl@ex.com' → Page 89, Row 1 ──────→  │ Page 89: [row1: Carl...] │
└─────────────────────────────┘            └──────────────────────────┘

If your range scan matches 1,000 rows scattered across 1,000 different heap pages, you do 1,000 random page reads — which may be worse than a sequential scan of the whole table. This is why the optimizer sometimes ignores your index: it calculates that a sequential scan is cheaper for high-selectivity cases.

Index-only scans: the fast path

If your query only needs columns that are in the index, the database can skip the heap fetch entirely:

-- Standard index on (user_id)
-- This query needs user_id (in index) AND status (not in index) → heap fetch required
SELECT status FROM orders WHERE user_id = 42;

-- Covering index on (user_id, status)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Now this query is an index-only scan — no heap fetch
SELECT status FROM orders WHERE user_id = 42;

In PostgreSQL, index-only scans also require the visibility map to indicate that heap pages don't need to be checked for row visibility (MVCC). Run VACUUM regularly to keep the visibility map current on heavily updated tables.

Write overhead: the cost side

Every index you add slows down writes:

INSERT: add a new entry to each index B-tree
UPDATE: if an indexed column changes, delete the old entry and insert a new one
DELETE: mark the index entry as dead (it's cleaned up by vacuum/autovacuum)

On a table with 5 indexes, an INSERT writes to 6 places (the table + 5 indexes). On high-write tables, index overhead is a real cost. PostgreSQL's EXPLAIN (ANALYZE, BUFFERS) shows buffer writes, which lets you see the write amplification.

Index types beyond B-tree

Hash index: O(1) equality lookup, no range support. Useful for pure equality on high-cardinality columns. In PostgreSQL, hash indexes are now WAL-logged (since PG 10) and safe for production.

GIN (Generalized Inverted Index): for indexing multi-valued types — arrays, JSONB, tsvector. Used for full-text search and JSONB @> / ? operators.

GiST (Generalized Search Tree): for geometric types, range types, and full-text search. Also used by pg_trgm for LIKE/ILIKE pattern matching.

BRIN (Block Range INdex): stores min/max values per block range. Tiny size, useful only for columns that correlate strongly with physical storage order (e.g., auto-increment IDs, monotonically increasing timestamps). Excellent for time-series append tables.

-- BRIN on a time-series table where rows are inserted in timestamp order
CREATE INDEX idx_events_created_brin ON events USING BRIN (created_at);
-- This index is ~100x smaller than a B-tree index on the same column
-- Only useful for range queries on naturally ordered data

The index mental model applied

When a query is slow, your first question should be: is the predicate in my WHERE clause covered by an index? Check with EXPLAIN. If it's doing a Seq Scan, check whether a usable index exists and whether the optimizer's selectivity estimate is accurate. If the index exists but isn't used, investigate whether a function or cast is defeating it, or whether the optimizer correctly determined the seq scan is cheaper.

Our offices

Follow us

What an Index Actually Does to Your Database

The abstraction you've been taking for granted

B-tree: the default index type

How the heap fetch works

Index-only scans: the fast path

Write overhead: the cost side

Index types beyond B-tree

The index mental model applied

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Why Backend Engineers Often Become the Most Overloaded People in a Team

Sharding Your Database Sounds Exciting Until You Actually Have to Do It

When One Developer Knows Everything About the System

Concurrency in Databases Is Tricky Until You Understand This