Your SQL Query Works. But It Won't When Your Data Grows.

by Eric Hanson, Backend Developer at Clean Systems Consulting

The query that passes review and kills production

You ship a feature. The query runs in under 100ms in staging. Six months later, ops is paging you at 2am because that same query is holding a full table lock on 40 million rows. This is not a hypothetical. It is one of the most predictable failure patterns in backend development, and it keeps happening because developers test with small datasets and reason about correctness instead of cost.

The problem isn't that your SQL is wrong. It's that SQL's cost structure is non-linear. A query that does a full table scan on 10,000 rows costs almost nothing. The same query on 10,000,000 rows is a disaster. And the relationship between row count and query time is not always proportional — it can be exponential once you introduce joins, sorts, and missing indexes.

What actually changes when data grows

Three things kill query performance at scale, and they compound each other:

1. Full table scans become unacceptable

-- Looks innocent. Runs fine at 50k rows. Brutal at 50M.
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2024-01-01';

Without a composite index on (status, created_at), the database reads every row. At 50 million rows on a table with 20 columns, that's potentially gigabytes of I/O for a query that returns 200 rows.

2. JOIN cardinality blows up

SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id;

If users has 1M rows and orders has 50M, the database has to assemble a 50M-row intermediate result before grouping. The query plan matters enormously here — a hash join behaves differently than a nested loop join, and the optimizer won't always pick correctly without proper statistics and indexes.

3. Implicit sorts destroy throughput

ORDER BY, GROUP BY, DISTINCT, and window functions all require sorting unless the data is already ordered on disk (via a clustered index or covering index). Sorting 10M rows in a temp buffer is slow. Sorting 10M rows that spill to disk is catastrophic.

The patterns that age poorly

SELECT * in application queries

-- Don't do this in application code
SELECT * FROM products WHERE category_id = 42;

You're pulling every column — including description TEXT, metadata JSONB, and image_data BYTEA — for a list view that needs only id, name, and price. At 10 rows this is fine. At 10,000 rows per page load, you're moving megabytes of data you immediately discard.

LIKE with a leading wildcard

-- Index on `email` is useless here
SELECT * FROM users WHERE email LIKE '%@example.com';

Any index on email is bypassed entirely. If you need suffix search, use a reverse index or a full-text search engine like PostgreSQL's tsvector / GIN indexes.

Correlated subqueries in SELECT

SELECT
  o.id,
  (SELECT COUNT(*) FROM order_items WHERE order_id = o.id) AS item_count
FROM orders o;

This executes the subquery once per row in orders. At 1M orders, that's 1M additional queries. Rewrite it as a JOIN with aggregation or use a window function.

How to write for scale from the start

Qualify your WHERE clauses to use indexes

Write your query, then ask: does each filter column have an index, and does the filter allow the database to use it? Leading wildcards, function calls on indexed columns (WHERE YEAR(created_at) = 2024), and implicit type coercions all defeat indexes.

-- Index-hostile
WHERE YEAR(created_at) = 2024

-- Index-friendly
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'

Select only what you need

Enumerate columns explicitly in application queries. Not just for performance — it also protects you when columns are added or removed.

Paginate aggressively

Never return unbounded result sets from application queries. Use LIMIT and OFFSET, or better, keyset pagination (also called cursor-based pagination) which doesn't degrade at deep offsets:

-- Offset pagination degrades at large offsets (reads and discards N rows)
SELECT * FROM events ORDER BY id LIMIT 100 OFFSET 50000;

-- Keyset pagination is O(1) regardless of position
SELECT * FROM events WHERE id > :last_seen_id ORDER BY id LIMIT 100;

Test with production-scale data

This is the only real fix. Seed your staging environment with realistic data volumes. Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN FORMAT=JSON (MySQL) against that data, not against a 5,000-row dev fixture.

What to do today

Pick one query in your application that touches a table expected to grow — orders, events, logs, user activity. Run EXPLAIN ANALYZE on it against your largest dataset. Look at the rows estimate vs actual, and check whether it's doing a Seq Scan where you'd expect an Index Scan. If the estimates are way off, your table statistics are stale — run ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) and check again. That single habit, applied regularly, will catch the majority of scale problems before they reach production.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Developer Who Cuts Corners to Look Fast

Speed looks impressive—until the shortcuts catch up with you. Cutting corners may make a developer look fast today, but it costs the team tomorrow.

Read more

Database Indexing in Rails — What I Check Before Every Deploy

Missing indexes are the most common cause of avoidable database performance problems in Rails applications. Here is the pre-deploy checklist I run and the index decisions that actually matter.

Read more

Auckland Backend Developers Cost NZ$130K and the Market Has Maybe 200 Senior Candidates — Here Is the Fix

You've talked to every recruiter in Auckland. They all send you the same five people. Three of them aren't looking.

Read more

Java Code Quality in Practice — The Rules That Help and the Ones That Don't

Most Java code quality guidance is either too abstract to apply or applied too rigidly to improve real codebases. Here is a honest assessment of the rules that consistently improve maintainability and the ones that create friction without payoff.

Read more