GROUP BY Is More Powerful Than Most Developers Use It For

by Eric Hanson, Backend Developer at Clean Systems Consulting

The reporting query that should be one statement

You need a revenue report: total revenue by region, by product category, and a grand total — all in one result set. Your current solution is three queries, a JOIN in the application layer, and a Python loop to compute the grand total. The entire thing runs in 4 seconds. It could be one SQL query running in 400ms.

GROUP BY combined with SQL's grouping extensions (ROLLUP, CUBE, GROUPING SETS) can produce multi-dimensional aggregations in a single pass over the data. Most developers don't know these exist because they're not in the basic SQL tutorials.

The HAVING clause: filtering on aggregated values

WHERE filters rows before aggregation. HAVING filters on the result of aggregation — a distinction that confuses developers who try to put aggregate functions in WHERE and get an error.

-- Wrong: aggregate function in WHERE
SELECT user_id, SUM(total) AS revenue
FROM orders
WHERE SUM(total) > 1000   -- ERROR: aggregate functions not allowed in WHERE
GROUP BY user_id;

-- Right: aggregate condition in HAVING
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id
HAVING SUM(total) > 1000;

HAVING runs after aggregation, so it can reference aggregate expressions. You can combine both:

-- Only high-value completed orders, grouped by user, showing only users with > $5k revenue
SELECT
  user_id,
  COUNT(*) AS order_count,
  SUM(total) AS total_revenue
FROM orders
WHERE status = 'completed'        -- Filters rows before grouping
GROUP BY user_id
HAVING SUM(total) > 5000          -- Filters groups after aggregation
ORDER BY total_revenue DESC;

ROLLUP: subtotals and grand totals

GROUP BY ROLLUP(a, b) generates all subtotal combinations: grouped by (a, b), grouped by (a) alone, and a grand total row. This replaces multiple queries with UNION ALL.

SELECT
  region,
  category,
  SUM(revenue) AS total_revenue,
  GROUPING(region) AS is_region_subtotal,    -- 1 if this row is a subtotal
  GROUPING(category) AS is_category_subtotal
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY region NULLS LAST, category NULLS LAST;

Result includes:

  • One row per (region, category) combination
  • One row per region (category = NULL), summing all categories in that region
  • One grand total row (region = NULL, category = NULL)

GROUPING() returns 1 when the column is rolled up (NULL due to subtotal), 0 otherwise. This lets you distinguish a row where category is genuinely NULL from a subtotal row.

CUBE: all combinations

GROUP BY CUBE(a, b, c) generates subtotals for all 2ⁿ combinations of the grouped columns. For three columns, that's 8 grouping sets.

SELECT
  year,
  quarter,
  region,
  SUM(revenue)
FROM sales
GROUP BY CUBE(year, quarter, region);
-- Generates: (year,quarter,region), (year,quarter), (year,region),
--            (quarter,region), (year), (quarter), (region), ()

CUBE is expensive on large datasets — it's doing multiple aggregation passes. Use it for fixed-dimension OLAP-style reports, not for high-traffic OLTP queries.

GROUPING SETS: explicit control

GROUPING SETS lets you specify exactly which combinations you want, without generating all possible combinations:

-- Only want: by region, by category, and grand total — not by (region, category)
SELECT
  region,
  category,
  SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
  (region),
  (category),
  ()         -- grand total
);

This replaces three separate queries with UNION ALL:

-- Equivalent (less efficient — three scans)
SELECT region, NULL as category, SUM(revenue) FROM sales GROUP BY region
UNION ALL
SELECT NULL, category, SUM(revenue) FROM sales GROUP BY category
UNION ALL
SELECT NULL, NULL, SUM(revenue) FROM sales;

The GROUPING SETS version reads the table once. The UNION ALL version reads it three times.

Conditional aggregation: pivoting rows to columns

A classic reporting requirement: show each month's revenue as a separate column for the current year.

SELECT
  product_id,
  SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 1  THEN total ELSE 0 END) AS jan,
  SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 2  THEN total ELSE 0 END) AS feb,
  SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 3  THEN total ELSE 0 END) AS mar,
  -- ... through December
  SUM(total) AS annual_total
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY product_id;

One scan, one result set, one row per product. The alternative — 12 queries or a complex pivot in application code — is slower and harder to maintain.

The efficiency principle

All these features — ROLLUP, CUBE, GROUPING SETS, conditional aggregation — exploit the same underlying advantage: one table scan to produce multiple aggregation levels. The database processes data in blocks from storage; every additional query adds another full scan. Expressing multi-dimensional requirements in a single SQL statement is not just more elegant — it's faster by a constant factor equal to the number of queries it replaces.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Employee Monitoring Tools Are Not Necessary for Remote Teams

Trust beats tracking. Remote teams thrive on autonomy, not constant surveillance.

Read more

Event-Driven Design in Spring Boot — ApplicationEvents, Spring Integration, and When to Use a Message Broker

Events decouple producers from consumers within and across services. Spring Boot offers three tiers: in-process ApplicationEvents for same-JVM decoupling, Spring Integration for lightweight messaging patterns, and external brokers for durability and cross-service communication.

Read more

How to Write Maintainable Backend Code

Ever opened a backend project and immediately thought, “I have no idea what’s going on”? That’s not bad luck—that’s bad maintainability.

Read more

Circuit Breakers in Microservices: Stop Letting One Failure Break Everything

Circuit breakers are the difference between a contained service degradation and a cascading system failure. Implementing them correctly requires more than adding a dependency — it requires understanding states, thresholds, and fallback design.

Read more