Why Your Database Gets Slower as Your Table Gets Bigger

by Arif Ikhsanudin, Backend Developer

Growth doesn't slow everything equally

You're at a planning meeting. Engineering is debating whether to archive old data or add more database capacity. The question underneath is: why does the database slow down as tables grow, and which queries are affected?

The answer isn't "more rows = slower queries" uniformly. Some queries on large tables are just as fast as on small ones. Others degrade linearly. A few collapse exponentially. Understanding the difference tells you what to fix and where investment pays off.

What the storage engine is actually doing

Modern relational databases (PostgreSQL's heap storage, InnoDB in MySQL, SQL Server's B-tree pages) organize data in fixed-size pages, typically 8KB or 16KB. A table with 10 million rows might occupy 80,000 pages.

When you run a query, the database reads pages from disk (or buffer pool if cached). The number of pages read is the primary cost driver — not the number of rows directly.

Sequential scans read all pages. Cost scales linearly with table size. A table that is 10x bigger takes 10x longer to scan.

Index lookups read a logarithmic number of index pages plus the heap pages for matching rows. Cost scales very slowly with table size — a B-tree index over 10 million rows has depth ~log₁₆(10,000,000) ≈ 6, meaning at most 6 index page reads to find any row. This doesn't change much even at 100 million rows.

The implication: queries doing index lookups stay fast as tables grow. Queries doing full scans get proportionally slower.

Why indexes degrade with table growth (sometimes)

A healthy index keeps queries fast indefinitely. But two things cause index effectiveness to degrade:

1. Index bloat from updates and deletes

In PostgreSQL, updated rows create new versions (MVCC), and old versions accumulate until VACUUM reclaims them. A heavily updated table can have an index pointing to many dead row versions, bloating the index and causing extra page reads. Monitor index bloat with:

-- PostgreSQL: check index vs table size ratio
SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

An index larger than the table it covers is a signal worth investigating.

2. Index selectivity decreasing with data distribution shifts

If you have an index on status with values active, inactive, and deleted, and over time 95% of rows become deleted, the index on status = 'deleted' is low-selectivity — the database may prefer a sequential scan. The query plan that was good when the table was small might change as data distribution shifts.

Track this with pg_stats (PostgreSQL) or information_schema.COLUMN_STATISTICS (MySQL 8):

-- PostgreSQL: check MCV (most common values) distribution
SELECT most_common_vals, most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'status';

Join performance and table size interaction

Join cost is driven by the cardinality of the inputs. When a small table joins to a large table, most strategies work well. When both tables are large, the join algorithm choice becomes critical:

  • Hash join: builds a hash table from the smaller relation in memory. Degrades if the hash table spills to disk (controlled by work_mem in PostgreSQL). Fine for large tables if you have enough memory allocated.
  • Merge join: requires both inputs to be sorted. Fast when both sides have covering sorted indexes. Problematic when sort spills to disk.
  • Nested loop: good when the inner side is an index lookup. Catastrophic when the inner side requires a full scan (O(n²) complexity).

As tables grow, joins that were using nested loops (because the inner side was small enough that a scan was tolerable) shift to hash joins. This is usually automatically handled by the optimizer — unless statistics are stale, in which case the optimizer may stick with a suboptimal plan.

The buffer pool effect

Most databases maintain an in-memory buffer pool (PostgreSQL's shared_buffers, InnoDB's innodb_buffer_pool_size). When a table fits entirely in the buffer pool, queries run from memory. When it doesn't, queries incur disk I/O.

A table that grows past the buffer pool size will experience a step-change in performance — queries that ran fast when the data was cached start waiting on disk reads. This often manifests as sudden slowdowns rather than gradual degradation.

On a server with 32GB RAM, a PostgreSQL shared_buffers of 8GB means any table over ~8GB will have frequent cache misses. The fix is either more memory, a smarter caching strategy, or partitioning to keep hot data in cache.

Table bloat and vacuum lag

In PostgreSQL, AUTOVACUUM reclaims dead row versions but can fall behind on high-write tables. A bloated table has many dead rows interspersed with live rows, forcing full scans to read more pages than necessary to find live data.

Check table bloat:

SELECT
  schemaname,
  tablename,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_pct DESC;

Tables with >20% dead rows and high query frequency should have their autovacuum settings tuned (autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold).

The practical response

When a table hits a performance threshold, work through this checklist in order: check query plans for sequential scans that should be index scans, check index bloat and dead tuple ratios, check data distribution for low-selectivity columns, and check whether the hot working set still fits in the buffer pool. Most growth-related slowdowns are explained by one of these four causes.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

LA's Tech Scene Is Everywhere and Nowhere — Why Backend Hiring Is Harder Than It Looks

Los Angeles has more tech activity than most people realize. That doesn't make finding a solid backend developer any easier.

Read more

Estimation Is Hard. Here Is Why You Keep Getting It Wrong.

Software estimation is consistently optimistic in predictable ways, for predictable reasons. Understanding those reasons doesn't make estimation easy, but it makes it less wrong.

Read more

How Remote Engineering Teams Work Across Time Zones

Managing a team spread across the globe sounds chaotic. In practice, it’s all about structure, communication, and respect for time.

Read more

Java Optional — What It's For, What It's Not For, and How to Use It Well

Optional is a return type that signals absence explicitly. It's not a null replacement, not a container to store in fields, and not a way to avoid NullPointerException everywhere. Used correctly, it improves API clarity. Used incorrectly, it adds allocation and verbosity without benefit.

Read more