Stop Writing Subqueries When a JOIN Will Do

by Eric Hanson, Backend Developer at Clean Systems Consulting

When the intuitive approach is the slow one

You need to find all users who have placed an order in the last 30 days. You write:

SELECT * FROM users
WHERE id IN (
  SELECT user_id FROM orders WHERE created_at > NOW() - INTERVAL '30 days'
);

It returns correct results. Code review passes. Six months later, with 5 million users and 20 million orders, it's running for 8 seconds. A JOIN rewrites it to run in 80ms. The logic is identical. The execution plan is not.

This pattern — reaching for a subquery when a JOIN is the right tool — is one of the most common SQL performance mistakes I see in production codebases. It's not universal; subqueries have legitimate uses. But developers who learned SQL through intuition rather than execution model tend to overuse them.

What the database actually does

A subquery in a WHERE ... IN (...) clause can be executed in two ways by the query optimizer:

  1. Execute once, hash the results, probe the hash for each outer row — essentially a hash join. Modern optimizers (PostgreSQL 7.4+, MySQL 8.0+) will often do this automatically.
  2. Execute the subquery once per outer row — a correlated execution model. This is what happens with correlated subqueries and some uncorrelated ones on older planners.

The second model is catastrophic at scale. If your outer query returns 1 million rows, the subquery runs 1 million times.

The JOIN approach:

SELECT DISTINCT u.*
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > NOW() - INTERVAL '30 days';

Modern optimizers will execute this as a hash join or merge join — fundamentally the same execution as the optimized subquery form. But the JOIN makes the relationship explicit and gives the optimizer more freedom to choose the best strategy.

WHERE to use subqueries vs JOINs

Use a JOIN when:

  • You're filtering based on existence of related rows
  • You need columns from the related table in the SELECT
  • You're doing aggregations that combine data from multiple tables

Use a subquery (specifically a correlated subquery) when:

  • You need a value that depends on each outer row individually
  • You're using EXISTS with an early-exit condition
  • The subquery result can't be expressed as a flat join
-- EXISTS with correlated subquery is appropriate and efficient
-- The database stops scanning inner rows as soon as one match is found
SELECT u.id, u.name
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o
  WHERE o.user_id = u.id
    AND o.status = 'completed'
    AND o.created_at > NOW() - INTERVAL '30 days'
);

EXISTS is often faster than IN with a subquery for this pattern because it short-circuits on the first match. A JOIN with DISTINCT achieves the same result but reads all matching rows before deduplicating.

The correlated subquery trap

This is the form that truly destroys performance:

-- Executes the subquery once per row in orders
SELECT
  o.id,
  o.total,
  (SELECT name FROM users WHERE id = o.user_id) AS user_name
FROM orders o;

At 1 million orders, this is 1 million SELECT statements against the users table. Even with an index on users.id, the overhead of 1 million round trips to the index is significant. The JOIN version:

SELECT
  o.id,
  o.total,
  u.name AS user_name
FROM orders o
JOIN users u ON o.user_id = u.id;

This does one join operation across both tables, not N separate lookups. On a dataset of 1 million orders with an indexed user_id, the JOIN version will outperform the correlated subquery by an order of magnitude or more in most engines.

When subqueries are legitimately better

Scalar subqueries for derived values:

SELECT
  p.name,
  p.price,
  (SELECT AVG(price) FROM products WHERE category_id = p.category_id) AS category_avg_price
FROM products p;

This is a correlated subquery, so it has performance implications. But if you need the category average relative to each row, a window function is actually the better tool:

SELECT
  name,
  price,
  AVG(price) OVER (PARTITION BY category_id) AS category_avg_price
FROM products;

Subqueries to limit what gets joined:

-- Pre-filter orders before joining to reduce join size
SELECT u.name, recent.total
FROM users u
JOIN (
  SELECT user_id, SUM(total) AS total
  FROM orders
  WHERE created_at > NOW() - INTERVAL '90 days'
  GROUP BY user_id
) recent ON u.id = recent.user_id;

This is a derived table (inline view), not a correlated subquery. It aggregates before joining, which can significantly reduce the number of rows the join has to process.

The practical test

When you write a subquery, ask two questions: Is this correlated (does the subquery reference the outer query)? And does the subquery return a set of IDs you're filtering by? If yes to either, check whether a JOIN or EXISTS produces the same result. Then run EXPLAIN ANALYZE on both and compare the actual row counts and execution time. Don't trust the optimizer to always make the right choice — verify it.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Naming Your API Endpoints Is Harder Than It Looks

Endpoint naming seems trivial until it becomes inconsistent, ambiguous, and hard to evolve. Good naming requires treating APIs as long-lived contracts, not quick implementations.

Read more

Oslo Backend Engineers Cost NOK 850K+ Per Year — Here Is What Startups Do Instead

You posted a senior backend role three months ago. The only candidates within budget were junior. The ones with experience wanted NOK 900K and a signing bonus.

Read more

Why Every Engineering Team Needs a Tech Lead

At first, skipping a tech lead feels like saving money. Then decisions pile up, and nobody knows who should make them.

Read more

When a Developer Writes Code Nobody Else Is Allowed to Touch

At some point, someone says: “Don’t touch that part of the code.” And just like that, a normal system turns into a fragile one.

Read more