Aggregations Are Not Just COUNT and SUM. There Is More.

by Arif Ikhsanudin, Backend Developer

The Python script that SQL could replace

Your data team has a Python script that pulls revenue data, computes percentiles, builds histograms, calculates correlation coefficients, and aggregates tags into comma-separated strings. It runs for 8 minutes because it loads 10 million rows into pandas. Most of what this script does can be done directly in PostgreSQL, in a fraction of the time, with no data transfer.

Developers default to COUNT, SUM, AVG, MIN, MAX because those are the functions in every SQL tutorial. Modern databases — particularly PostgreSQL — have a much richer aggregation library.

Statistical aggregations

PostgreSQL provides a full suite of statistical functions usable as aggregates:

SELECT
  product_category,
  COUNT(*)                          AS order_count,
  AVG(order_value)                  AS mean_order_value,
  STDDEV(order_value)               AS stddev,
  VARIANCE(order_value)             AS variance,
  -- Skewness and excess kurtosis (PostgreSQL-specific)
  -- via extension or custom aggregates

  -- Correlation between order value and quantity
  CORR(order_value, quantity)       AS value_qty_correlation,

  -- Covariance
  COVAR_POP(order_value, quantity)  AS covariance,

  -- Regression: slope and intercept of order_value vs quantity
  REGR_SLOPE(order_value, quantity) AS regression_slope,
  REGR_INTERCEPT(order_value, quantity) AS regression_intercept,
  REGR_R2(order_value, quantity)    AS r_squared
FROM orders
WHERE created_at >= NOW() - INTERVAL '90 days'
GROUP BY product_category;

CORR, COVAR_POP, and REGR_* functions implement standard statistical calculations over grouped data. They're computed in a single pass over the table — no need to load data into application memory.

Percentile and ordered-set aggregates

Standard AVG doesn't tell you about distribution. Median and percentiles do:

SELECT
  product_category,
  -- Exact percentiles (sorts all values — can be memory-intensive)
  PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY order_value) AS p25,
  PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY order_value) AS median,
  PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY order_value) AS p75,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY order_value) AS p95,
  PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY order_value) AS p99,

  -- Discrete percentile (returns an actual data value, not interpolated)
  PERCENTILE_DISC(0.50) WITHIN GROUP (ORDER BY order_value) AS median_actual
FROM orders
GROUP BY product_category;

PERCENTILE_CONT interpolates between values. PERCENTILE_DISC returns the nearest actual data value. Both use WITHIN GROUP (ORDER BY ...) — the "ordered-set aggregate" syntax that specifies how to sort the values for percentile calculation.

For approximate percentiles on very large datasets (hundreds of millions of rows), PostgreSQL's percentile_cont requires sorting all values, which is expensive. Extensions like pg_tdigest provide approximate percentile calculation in O(n) time and constant memory.

MODE: most frequent value

SELECT
  product_category,
  MODE() WITHIN GROUP (ORDER BY payment_method) AS most_common_payment
FROM orders
GROUP BY product_category;

MODE() returns the most frequently occurring value in the ordered set. Equivalent to a GROUP BY subquery with ORDER BY count DESC LIMIT 1 per group, but expressed as a single aggregate.

String aggregation

Aggregating strings (collecting values from multiple rows into a single delimited string) is one of the most common requirements that developers implement in application code unnecessarily:

-- PostgreSQL: STRING_AGG
SELECT
  order_id,
  STRING_AGG(product_name, ', ' ORDER BY product_name) AS products_ordered
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY order_id;

-- MySQL: GROUP_CONCAT (with ORDER BY and SEPARATOR)
SELECT
  order_id,
  GROUP_CONCAT(product_name ORDER BY product_name SEPARATOR ', ') AS products_ordered
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY order_id;

This returns one row per order with a comma-separated list of product names. Without STRING_AGG, you'd fetch all order items and loop in application code.

Array aggregation (PostgreSQL)

For a more structured result that the application can process without parsing delimited strings:

SELECT
  order_id,
  ARRAY_AGG(product_id ORDER BY product_id) AS product_ids,
  ARRAY_AGG(product_name ORDER BY product_name) AS product_names
FROM order_items
JOIN products ON order_items.product_id = products.id
GROUP BY order_id;

Returns PostgreSQL arrays, which ORM drivers typically map to native arrays in the application language.

JSON aggregation

When the result needs to be structured data:

SELECT
  order_id,
  JSON_AGG(
    JSON_BUILD_OBJECT(
      'product_id', oi.product_id,
      'product_name', p.name,
      'quantity', oi.quantity,
      'unit_price', oi.unit_price
    )
    ORDER BY p.name
  ) AS line_items
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY order_id;

Returns one row per order with a JSON array of line items. This is an effective pattern for fetching nested data in one query rather than N+1 queries.

The practical test

Look at any post-processing code in your application that: computes percentiles or medians, builds comma-separated or JSON-serialized strings from multiple rows, calculates correlation or regression, or assembles nested structures from flat query results. Each of these is a candidate for a SQL aggregate. Test it with EXPLAIN ANALYZE to verify the query plan is sane, then benchmark against the application-side equivalent.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

When Banks Set the Salary Bar — How Zürich Startups Compete for Backend Talent

UBS offered your candidate CHF 160K base plus a bonus structure your startup can't even model. He took the meeting with you as a courtesy.

Read more

Why Silent Meetings With Cameras On Are a Bad Idea

Staring at a screen full of colleagues who aren’t saying a word is surprisingly stressful. Even with cameras off, the pressure to be “noticed” lingers.

Read more

Testing Rails APIs with RSpec — My Practical Approach

Request specs in Rails test the full stack efficiently, but most teams either over-test at the wrong layer or under-test the cases that matter. Here is the structure that finds real bugs without slowing the suite down.

Read more

The Difference Between Code That Works and Code That Lasts

Getting to green on your test suite is not the same thing as building something that will survive the next two years of requirements changes, team turnover, and production surprises.

Read more