Skip to main content

SQL Aggregations: Summarizing Data

So far you've been retrieving rows — individual records from the ShopFlow database. But some of the most important questions an application needs to answer aren't about individual rows at all:

  • How many orders are in each status?
  • What's the total revenue from shipped orders?
  • Which product category has the highest average price?

These questions need aggregations — SQL's ability to collapse many rows into a single summary value. Let's discover how this works.


Quick Reference

Aggregate functions:

SELECT
COUNT(*) AS total_rows,
COUNT(column) AS non_null_count,
SUM(price) AS total,
AVG(price) AS average,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products;

Grouping:

SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

Filtering groups (not rows):

SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 2;

Gotchas:

  • ⚠️ Every column in SELECT must either be inside an aggregate function or listed in GROUP BY
  • ⚠️ Use HAVING to filter on aggregated values, WHERE to filter on raw row values
  • ⚠️ COUNT(column) skips NULL values; COUNT(*) counts all rows

Version Information

Tested with:

  • PostgreSQL: 15.x, 16.x
  • Node.js: v18.x, v20.x, v22.x

Last verified: May 2025


What You Need to Know First

Required reading:

What We'll Cover in This Article

By the end of this guide, you'll understand:

  • The five core aggregate functions: COUNT, SUM, AVG, MIN, MAX
  • How GROUP BY splits rows into groups before aggregating
  • Why HAVING exists and when to use it instead of WHERE
  • How DISTINCT removes duplicates
  • How to combine aggregations with ORDER BY and LIMIT

What We'll Explain Along the Way

We'll introduce these concepts with full explanations:

  • The split-apply-combine mental model for GROUP BY
  • Why NULL values behave differently in aggregate functions
  • The order SQL processes clauses (which explains the WHERE vs HAVING distinction)

Part 1: Aggregate Functions — Collapsing Many Rows Into One

An aggregate function takes a set of rows and returns a single value. Think of it as compression: you feed in 8 product rows, and the function hands back one number.

COUNT — How Many Rows?

The most commonly used aggregate. COUNT(*) counts every row:

-- How many products are in the ShopFlow catalog?
SELECT COUNT(*) AS total_products
FROM products;
total_products
8

COUNT(column_name) counts only rows where that column is not NULL:

-- How many products have a description?
SELECT
COUNT(*) AS total_rows,
COUNT(description) AS rows_with_description
FROM products;
total_rowsrows_with_description
88

In our dataset, all products have descriptions. But if any had NULL in that column, COUNT(description) would skip them while COUNT(*) would still count them. This is a subtle but important distinction.

SUM — What's the Total?

SUM adds up the values in a numeric column:

-- What's the total value of all inventory?
SELECT SUM(price * stock_quantity) AS total_inventory_value
FROM products;
total_inventory_value
18333.35

You can use expressions inside aggregate functions — here, price * stock_quantity computes the value per product before summing.

-- Total revenue from all order items
SELECT SUM(price_at_purchase * quantity) AS total_revenue
FROM order_items;
total_revenue
958.82

AVG — What's the Average?

-- Average product price
SELECT
AVG(price) AS avg_price,
ROUND(AVG(price), 2) AS avg_price_rounded
FROM products;
avg_priceavg_price_rounded
64.986250000000000064.99

AVG returns full precision by default. ROUND(value, decimal_places) trims it to something readable.

Important: AVG ignores NULL values — it averages only the rows where the column has a value. This is almost always what you want, but be aware of it.

MIN and MAX — The Extremes

-- Cheapest and most expensive products
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products;
cheapestmost_expensive
19.99129.99

MIN and MAX work on text columns too — they return the alphabetically first and last values:

SELECT
MIN(name) AS first_alphabetically,
MAX(name) AS last_alphabetically
FROM products;
first_alphabeticallylast_alphabetically
Cable ManagementWireless Keyboard

Part 2: DISTINCT — Removing Duplicates

Before we get to GROUP BY, let's look at DISTINCT — a simpler tool for a common problem.

-- What categories exist in the product catalog?
SELECT DISTINCT category
FROM products;
category
Electronics
Accessories

Without DISTINCT, you'd get one row per product (8 rows, with "Electronics" and "Accessories" repeated). DISTINCT collapses duplicates so each unique value appears once.

DISTINCT can apply to multiple columns — in that case, it deduplicates on the combination:

-- What unique status values appear in orders?
SELECT DISTINCT status FROM orders;
status
shipped
pending
cancelled

COUNT(DISTINCT column) combines both — count unique values:

-- How many distinct categories are there?
SELECT COUNT(DISTINCT category) AS category_count
FROM products;
category_count
2

Part 3: GROUP BY — The Heart of Aggregation

DISTINCT shows what unique values exist. GROUP BY lets you compute something for each group. This is where aggregations become genuinely powerful.

The Mental Model: Split → Apply → Combine

When you write GROUP BY category, think of the database doing three steps:

  1. Split the rows into groups — one group per unique value of category
  2. Apply the aggregate function to each group separately
  3. Combine the results into one row per group

Let's see it in action:

-- Count products per category
SELECT
category,
COUNT(*) AS product_count
FROM products
GROUP BY category;
categoryproduct_count
Accessories3
Electronics5

The database split the 8 product rows into two groups (Electronics and Accessories), counted each group, and returned one summary row per group.

-- Average price per category
SELECT
category,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products
GROUP BY category;
categoryavg_pricecheapestmost_expensive
Accessories33.3219.9949.99
Electronics76.9939.99129.99

With one query, you've computed three different summary statistics for each product category.

The GROUP BY Rule: Every Column Must Be Grouped or Aggregated

Here's a rule PostgreSQL will enforce with an error if you break it:

Every column in the SELECT list must either appear in GROUP BY, or be wrapped in an aggregate function.

Why? Because GROUP BY collapses many rows into one. If you ask for a column that isn't grouped or aggregated, the database doesn't know which row's value to show for that group.

-- ❌ Error: "name" is not in GROUP BY and not aggregated
SELECT name, category, COUNT(*)
FROM products
GROUP BY category;

-- PostgreSQL error:
-- ERROR: column "products.name" must appear in the GROUP BY
-- clause or be used in an aggregate function

There are 5 Electronics products — if you asked for name without grouping by it, which of the 5 names should appear in the "Electronics" row? The database refuses to guess.

-- ✅ Include name in GROUP BY (now you get one row per category+name combination)
SELECT name, category, COUNT(*)
FROM products
GROUP BY category, name;

-- ✅ Or aggregate it (though MIN(name) is rarely useful in practice)
SELECT category, MIN(name) AS sample_name, COUNT(*)
FROM products
GROUP BY category;

Grouping by Multiple Columns

You can group by several columns — each unique combination becomes its own group:

-- Count orders by status, broken down by... (we'll need order data)
-- For now: count products by category and stock availability
SELECT
category,
CASE WHEN stock_quantity > 0 THEN 'in_stock' ELSE 'out_of_stock' END AS availability,
COUNT(*) AS product_count
FROM products
GROUP BY category, availability
ORDER BY category, availability;
categoryavailabilityproduct_count
Accessoriesin_stock2
Accessoriesout_of_stock1
Electronicsin_stock4
Electronicsout_of_stock1

Notice the CASE WHEN ... END expression — it creates a derived column on the fly. We can group by it just like any real column. We'll use CASE WHEN more in later articles; for now, read it as "if stock_quantity > 0, label it 'in_stock', otherwise 'out_of_stock'".


Part 4: HAVING — Filtering Groups

You already know WHERE filters rows. But what if you want to filter based on an aggregated value — like "only show me categories with more than 3 products"?

You can't use WHERE for this. Here's why, and what to use instead.

Why WHERE Doesn't Work on Aggregated Values

SQL processes clauses in a specific order:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

Notice that WHERE happens before GROUP BY. That means WHERE runs on individual rows before any grouping or aggregation has occurred. At WHERE time, there's no COUNT(*) yet — that number doesn't exist until after GROUP BY runs.

If you try to use an aggregate in WHERE, PostgreSQL tells you clearly:

-- ❌ Error: aggregate functions not allowed in WHERE
SELECT category, COUNT(*) AS product_count
FROM products
WHERE COUNT(*) > 3
GROUP BY category;

-- ERROR: aggregate functions are not allowed in WHERE

HAVING: WHERE for Groups

HAVING runs after GROUP BY — after the aggregation has happened. This is when you can filter on computed values:

-- ✅ Only show categories with more than 3 products
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 3;
categoryproduct_count
Electronics5

Accessories (3 products) doesn't appear — it failed the HAVING condition.

-- Categories where average price exceeds $50
SELECT
category,
ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 50;
categoryavg_price
Electronics76.99

WHERE and HAVING Together

They serve different purposes and can both appear in the same query:

-- Among in-stock products only,
-- find categories where average price exceeds $50
SELECT
category,
ROUND(AVG(price), 2) AS avg_price,
COUNT(*) AS product_count
FROM products
WHERE stock_quantity > 0 -- filter individual rows FIRST
GROUP BY category
HAVING AVG(price) > 50 -- then filter groups
ORDER BY avg_price DESC;

Here's what happens step by step:

  1. WHERE stock_quantity > 0 — removes out-of-stock products (Laptop Stand, Cable Management) from consideration
  2. GROUP BY category — groups the remaining 6 rows by category
  3. HAVING AVG(price) > 50 — keeps only groups where the average price of in-stock items exceeds $50
  4. ORDER BY sorts the result

This is a meaningful business query: "among products you can actually sell, which categories have a high average price point?"


Part 5: Combining Aggregations with ORDER BY and LIMIT

Aggregate queries work naturally with ORDER BY and LIMIT — you just use the aggregate value in ORDER BY:

-- Top 3 most-stocked product categories by total units
SELECT
category,
SUM(stock_quantity) AS total_units
FROM products
GROUP BY category
ORDER BY total_units DESC
LIMIT 3;
categorytotal_units
Electronics221
Accessories75

You can reference the alias (total_units) in ORDER BY — PostgreSQL resolves it. Some databases don't allow alias references in ORDER BY, requiring you to repeat the expression: ORDER BY SUM(stock_quantity) DESC.


Part 6: Practical Aggregation Queries

Let's apply what we've learned to realistic ShopFlow business questions.

Order Summary by Status

-- How many orders are in each status?
SELECT
status,
COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;
statusorder_count
shipped3
pending2
cancelled1

Revenue from Order Items

-- Total revenue per order
SELECT
order_id,
SUM(price_at_purchase * quantity) AS order_total,
SUM(quantity) AS total_items
FROM order_items
GROUP BY order_id
ORDER BY order_total DESC;
order_idorder_totaltotal_items
6119.973
1169.973
5139.982
394.982
259.991
4129.991

Finding Orders Above a Threshold

-- Orders where the total exceeds $100
SELECT
order_id,
SUM(price_at_purchase * quantity) AS order_total
FROM order_items
GROUP BY order_id
HAVING SUM(price_at_purchase * quantity) > 100
ORDER BY order_total DESC;
order_idorder_total
1169.97
5139.98
4129.99
6119.97

Common Misconceptions

❌ Misconception: Use WHERE to filter on aggregated values

Reality: WHERE filters individual rows before grouping. Aggregate functions like COUNT(*) don't exist at WHERE time. Use HAVING for filtering on aggregate results.

Why this matters: The query will fail with an error — or worse, if you construct the condition in a way that doesn't use the aggregate, it silently returns wrong results.

Example:

-- ❌ Error: can't use COUNT(*) in WHERE
SELECT category, COUNT(*) FROM products
WHERE COUNT(*) > 2
GROUP BY category;

-- ✅ Correct: HAVING filters after grouping
SELECT category, COUNT(*) FROM products
GROUP BY category
HAVING COUNT(*) > 2;

❌ Misconception: COUNT(*) and COUNT(column) are the same

Reality: COUNT(*) counts every row. COUNT(column) skips rows where that column is NULL.

Why this matters: If you're counting an optional column (like description in products), you'll get a lower count than expected and might not notice.

Example:

-- Imagine some products have NULL descriptions
-- COUNT(*) = 8 (all rows)
-- COUNT(description) = 6 (only rows with non-NULL description)
SELECT COUNT(*), COUNT(description) FROM products;

❌ Misconception: You can SELECT non-grouped columns freely

Reality: PostgreSQL enforces the rule strictly — every SELECT column must be in GROUP BY or inside an aggregate. MySQL (with default settings) allows this but returns a random row's value, which is a silent bug.

Why this matters: If you're used to MySQL's lenient behavior, switching to PostgreSQL will surface these as errors. The PostgreSQL behavior is correct — MySQL's permissive behavior masks real ambiguity.


Troubleshooting Common Issues

Problem: ERROR: column must appear in GROUP BY clause

Symptoms: PostgreSQL refuses to run your query with this error.

Cause: You're selecting a column that isn't in GROUP BY and isn't inside an aggregate function.

Solution:

-- ❌ Causes error
SELECT name, category, COUNT(*) FROM products GROUP BY category;

-- ✅ Option 1: Add the column to GROUP BY
SELECT name, category, COUNT(*) FROM products GROUP BY category, name;

-- ✅ Option 2: Aggregate the column
SELECT MAX(name) AS sample_name, category, COUNT(*) FROM products GROUP BY category;

-- ✅ Option 3: Remove the column if you don't need it
SELECT category, COUNT(*) FROM products GROUP BY category;

Problem: HAVING condition produces unexpected results

Symptoms: Your HAVING filter seems too aggressive or too lenient.

Diagnostic steps:

-- Step 1: Run without HAVING to see all groups
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category;

-- Step 2: Add HAVING and compare
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
HAVING COUNT(*) > 2;

-- Step 3: Verify the threshold is what you meant
-- (is it > 2, or >= 2, or > 3?)

Problem: AVG returns NULL

Symptoms: AVG(column) returns NULL instead of a number.

Cause: The column has NULL values for every row in the group, or the table has no rows.

Solution:

-- Use COALESCE to provide a fallback value
SELECT COALESCE(AVG(price), 0) AS avg_price FROM products;

-- Or check if there are rows before aggregating
SELECT COUNT(*), AVG(price) FROM products;

Check Your Understanding

Quick Quiz

  1. What's the difference between WHERE and HAVING?

    Show Answer

    WHERE filters individual rows before GROUP BY runs. HAVING filters groups after GROUP BY runs. Use WHERE for conditions on raw column values; use HAVING for conditions on aggregate values like COUNT(*), SUM(), or AVG().

  2. What will this query return?

    SELECT COUNT(*), COUNT(description) FROM products
    WHERE category = 'Electronics';
    Show Answer

    Both return 5 in our dataset, because all 5 Electronics products have non-NULL descriptions. If any Electronics product had a NULL description, COUNT(description) would be lower than COUNT(*).

  3. Why does this query fail?

    SELECT category, name, COUNT(*)
    FROM products
    GROUP BY category;
    Show Answer

    name is in the SELECT list but not in GROUP BY and not wrapped in an aggregate function. PostgreSQL doesn't know which product's name to show for each category group. Fix by adding name to GROUP BY, removing it from SELECT, or wrapping it in an aggregate like MIN(name).

Hands-On Challenge

Task: Write a query that shows each order status with its order count and the percentage of total orders it represents. Sort by order count descending.

Expected result:

statusorder_countpercentage
shipped350.00
pending233.33
cancelled116.67
Show Solution
SELECT
status,
COUNT(*) AS order_count,
ROUND(
COUNT(*) * 100.0 / (SELECT COUNT(*) FROM orders),
2
) AS percentage
FROM orders
GROUP BY status
ORDER BY order_count DESC;

Key techniques:

  • COUNT(*) * 100.0 / (SELECT COUNT(*) FROM orders) — the inner SELECT COUNT(*) FROM orders is a subquery that returns the total order count. Dividing by it gives a fraction; multiplying by 100.0 converts to a percentage. The .0 ensures decimal division instead of integer division.
  • ROUND(..., 2) — rounds to 2 decimal places
  • We'll cover subqueries in more depth in a later article; for now, know that a SELECT statement can be embedded inside another one

Summary: Key Takeaways

  • Aggregate functions collapse many rows into a single summary value: COUNT, SUM, AVG, MIN, MAX
  • COUNT(*) counts all rows; COUNT(column) skips NULL values
  • DISTINCT removes duplicate values before aggregating or displaying
  • GROUP BY splits rows into groups, applies the aggregate to each group, and returns one row per group
  • The GROUP BY rule: every column in SELECT must be in GROUP BY or inside an aggregate — PostgreSQL enforces this strictly
  • WHERE filters rows before grouping; HAVING filters groups after grouping
  • Use HAVING for any condition that involves an aggregate value
  • WHERE and HAVING can both appear in the same query — they serve different purposes

What's Next?

You can now summarize data from a single table using aggregate functions and grouping. But ShopFlow's most interesting questions involve multiple tables: "which users placed the most orders?" or "what's the total revenue broken down by product?"

The next step is SQL Joins: Combining Tables — where you'll learn to combine data from multiple tables in a single query using JOIN. The aggregation skills from this article pair directly with joins, letting you ask cross-table questions like "total revenue per user" with a single query.