Aggregations Are Not Just COUNT and SUM. There Is More.

April 22, 2026

by Arif Ikhsanudin, Backend Developer

The Python script that SQL could replace

Your data team has a Python script that pulls revenue data, computes percentiles, builds histograms, calculates correlation coefficients, and aggregates tags into comma-separated strings. It runs for 8 minutes because it loads 10 million rows into pandas. Most of what this script does can be done directly in PostgreSQL, in a fraction of the time, with no data transfer.

Developers default to COUNT, SUM, AVG, MIN, MAX because those are the functions in every SQL tutorial. Modern databases — particularly PostgreSQL — have a much richer aggregation library.

Statistical aggregations

PostgreSQL provides a full suite of statistical functions usable as aggregates:

SELECT
  product_category,
  COUNT(*)                          AS order_count,
  AVG(order_value)                  AS mean_order_value,
  STDDEV(order_value)               AS stddev,
  VARIANCE(order_value)             AS variance,
  -- Skewness and excess kurtosis (PostgreSQL-specific)
  -- via extension or custom aggregates

  -- Correlation between order value and quantity
  CORR(order_value, quantity)       AS value_qty_correlation,

  -- Covariance
  COVAR_POP(order_value, quantity)  AS covariance,

  -- Regression: slope and intercept of order_value vs quantity
  REGR_SLOPE(order_value, quantity) AS regression_slope,
  REGR_INTERCEPT(order_value, quantity) AS regression_intercept,
  REGR_R2(order_value, quantity)    AS r_squared
FROM orders
WHERE created_at >= NOW() - INTERVAL '90 days'
GROUP BY product_category;

CORR, COVAR_POP, and REGR_* functions implement standard statistical calculations over grouped data. They're computed in a single pass over the table — no need to load data into application memory.

Percentile and ordered-set aggregates

Standard AVG doesn't tell you about distribution. Median and percentiles do:

SELECT
  product_category,
  -- Exact percentiles (sorts all values — can be memory-intensive)
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY order_value) AS p25,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY order_value) AS median,
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY order_value) AS p75,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY order_value) AS p95,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY order_value) AS p99,

  -- Discrete percentile (returns an actual data value, not interpolated)
  PERCENTILE_DISC(0.50) WITHIN GROUP (ORDER BY order_value) AS median_actual
FROM orders
GROUP BY product_category;

PERCENTILE_CONT interpolates between values. PERCENTILE_DISC returns the nearest actual data value. Both use WITHIN GROUP (ORDER BY ...) — the "ordered-set aggregate" syntax that specifies how to sort the values for percentile calculation.

For approximate percentiles on very large datasets (hundreds of millions of rows), PostgreSQL's percentile_cont requires sorting all values, which is expensive. Extensions like pg_tdigest provide approximate percentile calculation in O(n) time and constant memory.

MODE: most frequent value

SELECT
  product_category,
  MODE() WITHIN GROUP (ORDER BY payment_method) AS most_common_payment
FROM orders
GROUP BY product_category;

MODE() returns the most frequently occurring value in the ordered set. Equivalent to a GROUP BY subquery with ORDER BY count DESC LIMIT 1 per group, but expressed as a single aggregate.

String aggregation

Aggregating strings (collecting values from multiple rows into a single delimited string) is one of the most common requirements that developers implement in application code unnecessarily:

-- PostgreSQL: STRING_AGG
SELECT
  order_id,
  STRING_AGG(product_name, ', ' ORDER BY product_name) AS products_ordered
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY order_id;

-- MySQL: GROUP_CONCAT (with ORDER BY and SEPARATOR)
SELECT
  order_id,
  GROUP_CONCAT(product_name ORDER BY product_name SEPARATOR ', ') AS products_ordered
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY order_id;

This returns one row per order with a comma-separated list of product names. Without STRING_AGG, you'd fetch all order items and loop in application code.

Array aggregation (PostgreSQL)

For a more structured result that the application can process without parsing delimited strings:

SELECT
  order_id,
  ARRAY_AGG(product_id ORDER BY product_id) AS product_ids,
  ARRAY_AGG(product_name ORDER BY product_name) AS product_names
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY order_id;

Returns PostgreSQL arrays, which ORM drivers typically map to native arrays in the application language.

JSON aggregation

When the result needs to be structured data:

SELECT
  order_id,
  JSON_AGG(
    JSON_BUILD_OBJECT(
      'product_id', oi.product_id,
      'product_name', p.name,
      'quantity', oi.quantity,
      'unit_price', oi.unit_price
    )
    ORDER BY p.name
  ) AS line_items
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY order_id;

Returns one row per order with a JSON array of line items. This is an effective pattern for fetching nested data in one query rather than N+1 queries.

The practical test

Look at any post-processing code in your application that: computes percentiles or medians, builds comma-separated or JSON-serialized strings from multiple rows, calculates correlation or regression, or assembles nested structures from flat query results. Each of these is a candidate for a SQL aggregate. Test it with EXPLAIN ANALYZE to verify the query plan is sane, then benchmark against the application-side equivalent.

Our offices

Follow us

Aggregations Are Not Just COUNT and SUM. There Is More.

The Python script that SQL could replace

Statistical aggregations

Percentile and ordered-set aggregates

MODE: most frequent value

String aggregation

Array aggregation (PostgreSQL)

JSON aggregation

The practical test

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

How to Explain a Technical Problem to Someone Who Is Not Technical

How to Stay Visible to Clients Even When You Are Not Working With Them

Stop Writing Subqueries When a JOIN Will Do

The Difference Between a Revision and a New Requirement