Window Functions: The SQL Feature That Changes How You Think About Data
by Eric Hanson, Backend Developer at Clean Systems Consulting
The query that requires a self-join until it doesn't
You need to rank customers by revenue within each region. Without window functions, you write a self-join: join the orders table to itself, count how many customers in the same region have higher revenue, add one. The query is convoluted, slow, and hard to maintain. Someone adds a window function and the query becomes three lines. This is the moment window functions click for most developers.
Window functions compute a value for each row based on a set of related rows — the "window" — without aggregating the rows into one. The output has the same number of rows as the input. Each row keeps its own data plus the computed window value.
The OVER clause: defining the window
The OVER clause is what makes a function a window function:
-- Regular aggregate: one row per group
SELECT department_id, AVG(salary) FROM employees GROUP BY department_id;
-- Window function: one row per employee, with department average alongside
SELECT
employee_id,
name,
salary,
AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary,
salary - AVG(salary) OVER (PARTITION BY department_id) AS diff_from_avg
FROM employees;
PARTITION BY divides rows into groups (like GROUP BY) but without collapsing them. ORDER BY within the OVER clause defines row order within the partition — critical for ranking and cumulative functions.
Ranking functions
SELECT
user_id,
revenue,
-- ROW_NUMBER: unique rank, no ties (arbitrary tiebreak)
ROW_NUMBER() OVER (ORDER BY revenue DESC) AS row_num,
-- RANK: same rank for ties, gaps after ties (1, 1, 3, 4)
RANK() OVER (ORDER BY revenue DESC) AS rank,
-- DENSE_RANK: same rank for ties, no gaps (1, 1, 2, 3)
DENSE_RANK() OVER (ORDER BY revenue DESC) AS dense_rank,
-- Percentile rank (0 to 1)
PERCENT_RANK() OVER (ORDER BY revenue DESC) AS pct_rank
FROM user_revenue;
Ranking within a partition (e.g., top customer per region):
WITH ranked AS (
SELECT
user_id,
region,
revenue,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY revenue DESC) AS rank_in_region
FROM user_revenue
)
SELECT user_id, region, revenue
FROM ranked
WHERE rank_in_region = 1;
This replaces a self-join or a correlated subquery. It's readable, and it runs in one pass over the data.
Running totals and cumulative aggregations
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY order_date) AS cumulative_revenue,
AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7-day moving average
) AS seven_day_avg
FROM daily_revenue_summary
ORDER BY order_date;
The ROWS BETWEEN clause defines the frame — which rows around the current row to include in the window. Without it, the default frame is "all rows from the partition start to the current row" for ordered windows, which is what gives you the running total.
Frame specifications:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: from partition start to current row (running total)ROWS BETWEEN 6 PRECEDING AND CURRENT ROW: current row and 6 preceding (7-row sliding window)ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: entire partition (same as partition aggregate)
LAG and LEAD: comparing to adjacent rows
Without window functions, comparing a row to the previous row requires a self-join on offset row numbers — awkward and slow. LAG and LEAD do this directly:
SELECT
order_date,
daily_revenue,
LAG(daily_revenue, 1) OVER (ORDER BY order_date) AS prev_day_revenue,
daily_revenue - LAG(daily_revenue, 1) OVER (ORDER BY order_date) AS day_over_day_change,
LEAD(daily_revenue, 7) OVER (ORDER BY order_date) AS revenue_next_week
FROM daily_revenue_summary;
LAG(expr, n) returns the value of expr from n rows before the current row in the ordered window. LEAD(expr, n) returns n rows after. Both accept a default value for when the offset is out of bounds.
NTILE: bucketing rows into groups
-- Divide customers into 4 revenue quartiles
SELECT
user_id,
revenue,
NTILE(4) OVER (ORDER BY revenue DESC) AS quartile
FROM user_revenue;
-- Quartile 1: top 25%, Quartile 4: bottom 25%
FIRST_VALUE and LAST_VALUE: anchored window values
SELECT
user_id,
order_date,
total,
FIRST_VALUE(total) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_order_total,
LAST_VALUE(total) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS latest_order_total
FROM orders;
Note: LAST_VALUE requires the explicit ROWS BETWEEN ... UNBOUNDED FOLLOWING frame — the default frame stops at the current row, so without it, LAST_VALUE returns the current row's value.
Performance characteristics
Window functions are executed after the WHERE and GROUP BY clauses but before the final ORDER BY and LIMIT. They're applied in a single pass over the result set. For large result sets with multiple OVER clauses using different PARTITION BY combinations, the database may need multiple sort operations. Covering indexes that match the PARTITION BY and ORDER BY columns can eliminate these sorts.
The practical improvement over self-joins is substantial. A self-join on a 1M-row table for row-comparison purposes requires a full join product before filtering — often O(n²) in the worst case. A window function processes rows in a single ordered pass — O(n log n) for the sort, O(n) for the computation.