The Query That Works Fine Until It Doesn't

by Eric Hanson, Backend Developer at Clean Systems Consulting

The query that graduates from slow to incident

You've seen this timeline: a query ships, works fine, gets forgotten. Then at some threshold — 1 million rows, a traffic spike, a data import that triples the table size — it becomes the bottleneck behind a P0 incident. You look at the query and nothing about it obviously screams "slow." It looks like any other query. The problem is structural.

Certain query patterns are inherently time-bombs. They're not wrong. They're not buggy. They just depend on data volume staying small, and that assumption is never documented. Here's how to recognize and defuse them before they detonate.

Pattern 1: The unbounded aggregation

SELECT user_id, SUM(amount) AS lifetime_value
FROM transactions
GROUP BY user_id;

This aggregates every row in transactions every time it runs. At 100k rows, 10ms. At 100M rows, called from a dashboard that auto-refreshes every 30 seconds — production is down.

The fix is pre-computation. Maintain a user_lifetime_value table updated incrementally via triggers or an event-driven pipeline. The query becomes:

SELECT user_id, lifetime_value FROM user_lifetime_value WHERE user_id = :id;

Alternatively, use materialized views (PostgreSQL, Oracle) with a refresh schedule that matches your tolerance for staleness:

CREATE MATERIALIZED VIEW user_ltv AS
SELECT user_id, SUM(amount) AS lifetime_value
FROM transactions
GROUP BY user_id;

-- Refresh on a schedule, not per request
REFRESH MATERIALIZED VIEW CONCURRENTLY user_ltv;

The CONCURRENTLY flag in PostgreSQL allows reads during refresh — without it, the view is locked during the refresh operation.

Pattern 2: The offset that gets deeper over time

-- Page 1: fast
SELECT * FROM events ORDER BY created_at DESC LIMIT 50 OFFSET 0;

-- Page 10,000: slow — reads and discards 500,000 rows
SELECT * FROM events ORDER BY created_at DESC LIMIT 50 OFFSET 500000;

OFFSET tells the database to skip N rows. To skip them, it has to read them first. At deep offsets this is a full or near-full table scan regardless of your LIMIT. This degrades predictably as your table grows and as users paginate deeper.

Keyset pagination eliminates this:

-- First page
SELECT id, title, created_at FROM events ORDER BY created_at DESC, id DESC LIMIT 50;

-- Next page (pass last seen values from previous page)
SELECT id, title, created_at FROM events
WHERE (created_at, id) < (:last_created_at, :last_id)
ORDER BY created_at DESC, id DESC
LIMIT 50;

Requires a composite index on (created_at DESC, id DESC). Works in constant time regardless of page depth.

Pattern 3: The filter that relies on function application

SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2024;

Wrapping a column in a function defeats any index on that column. The database has to compute EXTRACT(YEAR FROM created_at) for every single row before it can apply the filter.

-- Index-friendly equivalent
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

If you have an index on created_at, this uses it. The first form does not. This pattern shows up constantly with date functions, type casts, and string operations:

-- Index-hostile
WHERE LOWER(email) = 'user@example.com'
WHERE CAST(user_id AS TEXT) = '42'
WHERE DATE(created_at) = CURRENT_DATE

-- Index-friendly alternatives
WHERE email = LOWER('User@Example.com')  -- normalize at write time
WHERE user_id = 42                        -- fix the type mismatch upstream
WHERE created_at >= CURRENT_DATE AND created_at < CURRENT_DATE + INTERVAL '1 day'

Pattern 4: The JOIN that multiplies rows unexpectedly

SELECT u.id, u.name, t.tag_name
FROM users u
JOIN user_tags ut ON u.id = ut.user_id
JOIN tags t ON ut.tag_id = t.id
WHERE u.status = 'active';

If a user has 50 tags, they appear 50 times in the result. If the calling code treats each row as a distinct user, you now have a bug that silently corrupts aggregations. And if you add a GROUP BY to fix it without understanding the join, you might accidentally aggregate across users.

Always trace the cardinality through your join chain before writing the query. Ask: for each row in the outer table, how many rows will the inner join produce? If the answer is "potentially many," decide upfront whether you want the fan-out or need to aggregate it away.

Pattern 5: The query with no LIMIT in a loop

# Called once per API request, returns all matching records
def get_user_events(user_id):
    return db.query("SELECT * FROM events WHERE user_id = ?", user_id)

For most users, this returns 20 rows. For power users with years of history, it returns 50,000. Your application wasn't designed to handle 50,000 rows per request, your web server's memory wasn't allocated for it, and your API client certainly didn't expect it.

Always LIMIT queries that run in request contexts. Even if you "know" the user will have few records, protect against the outlier with an explicit ceiling.

The discipline

Before shipping any query that touches a table that grows over time, ask: what happens to this query when the table is 100x larger? If the answer is "it gets 100x slower," that's a structural problem you need to solve now, not at incident time. The cost of thinking through this at review time is zero. The cost of an unplanned scale incident is not.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Real Cost of a Backend Team in Manhattan — And How Async Contractors Change the Equation

You approved the budget for two backend hires. Then HR came back with the fully loaded numbers and suddenly the math didn't work anymore.

Read more

Questions to Ask Before Starting a Backend Project

“We just need an API… should be quick, right?” That sentence has started more fragile backend systems than anyone admits.

Read more

Technical Debt Is Not Always Bad. Unmanaged Technical Debt Is.

Technical debt is a deliberate tool that enables faster delivery in the short term at the cost of slower delivery later. Like financial debt, the problem is not taking it on — it is losing track of what you owe.

Read more

How to Save Money When You Don’t Know Your Taxes

You get paid, you feel good… then suddenly remember taxes exist. And now you’re wondering how much of that money is actually yours.

Read more