GROUP BY Is More Powerful Than Most Developers Use It For
by Eric Hanson, Backend Developer at Clean Systems Consulting
The reporting query that should be one statement
You need a revenue report: total revenue by region, by product category, and a grand total — all in one result set. Your current solution is three queries, a JOIN in the application layer, and a Python loop to compute the grand total. The entire thing runs in 4 seconds. It could be one SQL query running in 400ms.
GROUP BY combined with SQL's grouping extensions (ROLLUP, CUBE, GROUPING SETS) can produce multi-dimensional aggregations in a single pass over the data. Most developers don't know these exist because they're not in the basic SQL tutorials.
The HAVING clause: filtering on aggregated values
WHERE filters rows before aggregation. HAVING filters on the result of aggregation — a distinction that confuses developers who try to put aggregate functions in WHERE and get an error.
-- Wrong: aggregate function in WHERE
SELECT user_id, SUM(total) AS revenue
FROM orders
WHERE SUM(total) > 1000 -- ERROR: aggregate functions not allowed in WHERE
GROUP BY user_id;
-- Right: aggregate condition in HAVING
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id
HAVING SUM(total) > 1000;
HAVING runs after aggregation, so it can reference aggregate expressions. You can combine both:
-- Only high-value completed orders, grouped by user, showing only users with > $5k revenue
SELECT
user_id,
COUNT(*) AS order_count,
SUM(total) AS total_revenue
FROM orders
WHERE status = 'completed' -- Filters rows before grouping
GROUP BY user_id
HAVING SUM(total) > 5000 -- Filters groups after aggregation
ORDER BY total_revenue DESC;
ROLLUP: subtotals and grand totals
GROUP BY ROLLUP(a, b) generates all subtotal combinations: grouped by (a, b), grouped by (a) alone, and a grand total row. This replaces multiple queries with UNION ALL.
SELECT
region,
category,
SUM(revenue) AS total_revenue,
GROUPING(region) AS is_region_subtotal, -- 1 if this row is a subtotal
GROUPING(category) AS is_category_subtotal
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY region NULLS LAST, category NULLS LAST;
Result includes:
- One row per (region, category) combination
- One row per region (category = NULL), summing all categories in that region
- One grand total row (region = NULL, category = NULL)
GROUPING() returns 1 when the column is rolled up (NULL due to subtotal), 0 otherwise. This lets you distinguish a row where category is genuinely NULL from a subtotal row.
CUBE: all combinations
GROUP BY CUBE(a, b, c) generates subtotals for all 2ⁿ combinations of the grouped columns. For three columns, that's 8 grouping sets.
SELECT
year,
quarter,
region,
SUM(revenue)
FROM sales
GROUP BY CUBE(year, quarter, region);
-- Generates: (year,quarter,region), (year,quarter), (year,region),
-- (quarter,region), (year), (quarter), (region), ()
CUBE is expensive on large datasets — it's doing multiple aggregation passes. Use it for fixed-dimension OLAP-style reports, not for high-traffic OLTP queries.
GROUPING SETS: explicit control
GROUPING SETS lets you specify exactly which combinations you want, without generating all possible combinations:
-- Only want: by region, by category, and grand total — not by (region, category)
SELECT
region,
category,
SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
(region),
(category),
() -- grand total
);
This replaces three separate queries with UNION ALL:
-- Equivalent (less efficient — three scans)
SELECT region, NULL as category, SUM(revenue) FROM sales GROUP BY region
UNION ALL
SELECT NULL, category, SUM(revenue) FROM sales GROUP BY category
UNION ALL
SELECT NULL, NULL, SUM(revenue) FROM sales;
The GROUPING SETS version reads the table once. The UNION ALL version reads it three times.
Conditional aggregation: pivoting rows to columns
A classic reporting requirement: show each month's revenue as a separate column for the current year.
SELECT
product_id,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 1 THEN total ELSE 0 END) AS jan,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 2 THEN total ELSE 0 END) AS feb,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 3 THEN total ELSE 0 END) AS mar,
-- ... through December
SUM(total) AS annual_total
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
GROUP BY product_id;
One scan, one result set, one row per product. The alternative — 12 queries or a complex pivot in application code — is slower and harder to maintain.
The efficiency principle
All these features — ROLLUP, CUBE, GROUPING SETS, conditional aggregation — exploit the same underlying advantage: one table scan to produce multiple aggregation levels. The database processes data in blocks from storage; every additional query adds another full scan. Expressing multi-dimensional requirements in a single SQL statement is not just more elegant — it's faster by a constant factor equal to the number of queries it replaces.