What an Index Actually Does to Your Database

by Eric Hanson, Backend Developer at Clean Systems Consulting

The abstraction you've been taking for granted

You add an index, the query gets faster, you move on. But when a query is slow despite having an index, or when you're trying to decide whether to add one, or when you're diagnosing why writes are slowing down — at that point, treating indexes as a black box stops working. You need the mental model.

An index is a separate data structure maintained alongside your table, updated on every insert, update, and delete, organized to make specific lookup patterns fast. It has a cost (storage, write overhead) and a benefit (faster reads). Understanding the structure tells you when that tradeoff pays off.

B-tree: the default index type

Most indexes you create are B-tree indexes (Balanced Tree). The structure:

  • A tree of nodes, each containing sorted key values and pointers
  • The leaf nodes form a doubly linked list sorted by indexed value
  • Lookup traverses from root to leaf: O(log n) comparisons
  • Range scans follow the linked leaf nodes: O(log n + k) where k is result size

For a table with 10 million rows, the tree has approximately log₁₆(10,000,000) ≈ 6 levels. Finding any single row requires at most 6 page reads. At 100 million rows, it's ~7 levels. The logarithmic scaling is why indexes stay fast as tables grow.

-- Standard B-tree index
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Supports these operations (uses the index):
WHERE created_at = '2024-01-15'
WHERE created_at > '2024-01-01'
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY created_at ASC/DESC

-- Does NOT support (falls back to seq scan):
WHERE YEAR(created_at) = 2024   -- function applied to column
WHERE created_at::TEXT LIKE '2024%'  -- type cast

The key constraint: the index only helps when you access the column in its stored form, without wrapping functions or implicit casts.

How the heap fetch works

An index lookup has two parts:

  1. Index scan: traverse the B-tree to find matching leaf entries (fast, few page reads)
  2. Heap fetch: follow the row pointer from each leaf entry to read the actual row from the table storage (potentially expensive if rows are scattered across many pages)
Index (sorted by email):                   Heap (unordered rows):
┌─────────────────────────────┐            ┌──────────────────────────┐
│ 'adam@ex.com' → Page 47, Row 3 ──────→  │ Page 47: [row3: Adam...] │
│ 'beth@ex.com' → Page 12, Row 7 ──────→  │ Page 12: [row7: Beth...] │
│ 'carl@ex.com' → Page 89, Row 1 ──────→  │ Page 89: [row1: Carl...] │
└─────────────────────────────┘            └──────────────────────────┘

If your range scan matches 1,000 rows scattered across 1,000 different heap pages, you do 1,000 random page reads — which may be worse than a sequential scan of the whole table. This is why the optimizer sometimes ignores your index: it calculates that a sequential scan is cheaper for high-selectivity cases.

Index-only scans: the fast path

If your query only needs columns that are in the index, the database can skip the heap fetch entirely:

-- Standard index on (user_id)
-- This query needs user_id (in index) AND status (not in index) → heap fetch required
SELECT status FROM orders WHERE user_id = 42;

-- Covering index on (user_id, status)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Now this query is an index-only scan — no heap fetch
SELECT status FROM orders WHERE user_id = 42;

In PostgreSQL, index-only scans also require the visibility map to indicate that heap pages don't need to be checked for row visibility (MVCC). Run VACUUM regularly to keep the visibility map current on heavily updated tables.

Write overhead: the cost side

Every index you add slows down writes:

  • INSERT: add a new entry to each index B-tree
  • UPDATE: if an indexed column changes, delete the old entry and insert a new one
  • DELETE: mark the index entry as dead (it's cleaned up by vacuum/autovacuum)

On a table with 5 indexes, an INSERT writes to 6 places (the table + 5 indexes). On high-write tables, index overhead is a real cost. PostgreSQL's EXPLAIN (ANALYZE, BUFFERS) shows buffer writes, which lets you see the write amplification.

Index types beyond B-tree

Hash index: O(1) equality lookup, no range support. Useful for pure equality on high-cardinality columns. In PostgreSQL, hash indexes are now WAL-logged (since PG 10) and safe for production.

GIN (Generalized Inverted Index): for indexing multi-valued types — arrays, JSONB, tsvector. Used for full-text search and JSONB @> / ? operators.

GiST (Generalized Search Tree): for geometric types, range types, and full-text search. Also used by pg_trgm for LIKE/ILIKE pattern matching.

BRIN (Block Range INdex): stores min/max values per block range. Tiny size, useful only for columns that correlate strongly with physical storage order (e.g., auto-increment IDs, monotonically increasing timestamps). Excellent for time-series append tables.

-- BRIN on a time-series table where rows are inserted in timestamp order
CREATE INDEX idx_events_created_brin ON events USING BRIN (created_at);
-- This index is ~100x smaller than a B-tree index on the same column
-- Only useful for range queries on naturally ordered data

The index mental model applied

When a query is slow, your first question should be: is the predicate in my WHERE clause covered by an index? Check with EXPLAIN. If it's doing a Seq Scan, check whether a usable index exists and whether the optimizer's selectivity estimate is accurate. If the index exists but isn't used, investigate whether a function or cast is defeating it, or whether the optimizer correctly determined the seq scan is cheaper.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

CI/CD Is Not a Tool. It Is a Practice.

Most teams install a CI/CD tool and call it done. But running a pipeline is not the same as practicing continuous delivery — and confusing the two is why deployments are still painful.

Read more

Forced In-Person Work: When Contractors Are Treated Unfairly

“We require all contractors to be onsite five days a week.” That sentence often signals a deeper misunderstanding of what contracting actually is.

Read more

Why Your First Week With a New Client Sets the Tone for Everything

The impressions formed in the first week of a new engagement are surprisingly durable. The contractors who front-load professionalism reap benefits for the duration of the project.

Read more

The Red Green Refactor Cycle Is Simpler Than Most TDD Articles Make It Look

Most TDD explanations spend so long on philosophy that the mechanics get lost. The red-green-refactor cycle is three steps that take two to five minutes per iteration. Here is what each step actually means in practice.

Read more