What Actually Happens When SQL Executes Your Query

by Arif Ikhsanudin, Backend Developer

Why you can't predict query performance from SQL alone

Two developers argue about whether to rewrite a query. One insists the JOIN version is faster. The other has benchmarks showing the subquery version is equally fast on their machine. Both are right in different contexts, which means neither can explain why. The root cause is the same in both cases: neither developer has a mental model of what the database actually does when it executes SQL.

SQL is declarative. You say "give me all users who placed an order this month," and the database decides the execution strategy — which tables to read first, which indexes to use, how to combine results. This decoupling is powerful, but it means you can't reason about performance from the SQL text alone. You have to understand the pipeline.

The five stages of SQL execution

1. Parsing and validation

The database first parses your SQL into an abstract syntax tree and validates it: are the table names real? Do the columns exist? Does the user have permission? This is fast and happens before any data is touched.

2. Query rewriting

Before the optimizer runs, many databases apply rule-based rewrites. PostgreSQL, for example, will rewrite a view reference into the underlying query, expand * into explicit column names, and sometimes transform IN (subquery) into a semi-join internally. These rewrites happen invisibly and can significantly change what the optimizer sees.

3. Query optimization — the hard part

The optimizer's job is to find the lowest-cost execution plan from the space of all possible plans. For a query joining five tables, the number of possible join orderings is 5! = 120, and that's before considering different join algorithms (hash join, merge join, nested loop join) and different access methods (sequential scan, index scan, index-only scan, bitmap heap scan).

The optimizer estimates the cost of each plan using statistics — specifically, the distribution of data in each column. In PostgreSQL, this is controlled by pg_statistic and updated by the ANALYZE command. In MySQL, it's maintained by InnoDB's internal statistics, updated via ANALYZE TABLE.

When statistics are stale or misleading, the optimizer makes bad choices. This is the source of a huge proportion of "mysterious" slow query reports.

4. Plan execution

The selected plan is a tree of operators. Each operator reads from one or more inputs and produces a stream of rows. Common operators:

  • Seq Scan: reads every row in a table. O(n) in rows.
  • Index Scan: follows index entries to fetch individual heap rows. Good for selective queries returning few rows.
  • Index Only Scan: returns data directly from the index without hitting the heap. Requires the index to cover all needed columns.
  • Hash Join: builds a hash table from the smaller relation, then probes it with each row from the larger. Good for large unsorted inputs.
  • Merge Join: merges two sorted inputs. Requires sorted data, but is very efficient when that's available.
  • Nested Loop Join: for each row in the outer relation, scan the inner relation. Extremely fast when the inner side is an index lookup; catastrophic when it's a full scan.
-- A simplified representation of what EXPLAIN shows you
Hash Join
  Hash Cond: (o.user_id = u.id)
  -> Seq Scan on orders o
       Filter: (created_at > '2024-01-01')
  -> Hash
       -> Index Scan on users u
            Index Cond: (status = 'active')

Reading this plan: PostgreSQL will scan all orders, filter by date, build a hash table from active users (using an index), then probe the hash table for each order row.

5. Result delivery

The database streams the result set back to the client. For large result sets with ORDER BY, this requires sorting before delivery, which may spill to disk if the work_mem (PostgreSQL) or sort_buffer_size (MySQL) is too small.

What this means in practice

Row estimates drive plan choice. If the optimizer thinks a filter returns 100 rows but it actually returns 100,000, it may choose a nested loop join that's catastrophic at the real cardinality. This is why EXPLAIN ANALYZE — which shows both estimated and actual row counts — is more useful than EXPLAIN alone.

-- PostgreSQL: see estimated vs actual rows
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.name;

Look for large discrepancies between rows=X (estimate) and actual rows=Y. A 10x or greater difference is a signal that statistics need updating or that the optimizer needs help via query restructuring.

The optimizer is not magic. It makes locally greedy choices within a plan search space. Highly complex queries with many joins can exceed the optimizer's search budget (join_collapse_limit in PostgreSQL defaults to 8), causing it to evaluate only a subset of possible plans.

You can influence the plan. You can't force a specific plan in standard SQL (unlike Oracle's hints), but you can: create covering indexes, update statistics with ANALYZE, adjust work_mem per session for sort-heavy queries, or rewrite the query to expose a simpler structure to the optimizer.

Start here

Run EXPLAIN (ANALYZE, BUFFERS) on any slow query in PostgreSQL, or EXPLAIN FORMAT=JSON in MySQL. Before you tune anything, read the plan. Find the node with the highest actual time or the worst estimate-vs-actual row ratio. That node is your bottleneck. Everything else is secondary.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Your Transactions Are Bigger Than They Need to Be

Oversized transactions are one of the most common sources of lock contention, replication lag, and autovacuum interference in production databases — and they are almost always fixable without changing business logic.

Read more

Why Silent Meetings With Cameras On Are a Bad Idea

Staring at a screen full of colleagues who aren’t saying a word is surprisingly stressful. Even with cameras off, the pressure to be “noticed” lingers.

Read more

Spring Boot Testing Strategy — Unit Tests, Slice Tests, and When to Use @SpringBootTest

Spring Boot offers multiple testing approaches, each loading a different subset of the application context. The choice determines test speed, test scope, and how much infrastructure is required. Here is how to use each correctly.

Read more

Deadlocks in SQL: Why They Happen and How to Avoid Them

Deadlocks are deterministic — given the same access patterns and lock ordering, they will occur reliably — which means they are preventable once you understand the conditions that create them.

Read more