SQL Aggregations: Summarizing Data

So far you've been retrieving rows — individual records from the ShopFlow database. But some of the most important questions an application needs to answer aren't about individual rows at all:

How many orders are in each status?
What's the total revenue from shipped orders?
Which product category has the highest average price?

These questions need aggregations — SQL's ability to collapse many rows into a single summary value. Let's discover how this works.

Quick Reference

Aggregate functions:

SELECT
  COUNT(*)           AS total_rows,
  COUNT(column)      AS non_null_count,
  SUM(price)         AS total,
  AVG(price)         AS average,
  MIN(price)         AS cheapest,
  MAX(price)         AS most_expensive
FROM products;

Grouping:

SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;

Filtering groups (not rows):

SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 2;

Gotchas:

⚠️ Every column in SELECT must either be inside an aggregate function or listed in GROUP BY
⚠️ Use HAVING to filter on aggregated values, WHERE to filter on raw row values
⚠️ COUNT(column) skips NULL values; COUNT(*) counts all rows

Version Information

Tested with:

PostgreSQL: 15.x, 16.x
Node.js: v18.x, v20.x, v22.x

What You Need to Know First

Required reading:

Relational Databases: How Structured Data Works
SQL Basics: Reading Data with SELECT — WHERE, ORDER BY, and LIMIT are used throughout this article

What We'll Cover in This Article

By the end of this guide, you'll understand:

The five core aggregate functions: COUNT, SUM, AVG, MIN, MAX
How GROUP BY splits rows into groups before aggregating
Why HAVING exists and when to use it instead of WHERE
How DISTINCT removes duplicates
How to combine aggregations with ORDER BY and LIMIT

What We'll Explain Along the Way

We'll introduce these concepts with full explanations:

The split-apply-combine mental model for GROUP BY
Why NULL values behave differently in aggregate functions
The order SQL processes clauses (which explains the WHERE vs HAVING distinction)

Part 1: Aggregate Functions — Collapsing Many Rows Into One

An aggregate function takes a set of rows and returns a single value. Think of it as compression: you feed in 8 product rows, and the function hands back one number.

COUNT — How Many Rows?

The most commonly used aggregate. COUNT(*) counts every row:

-- How many products are in the ShopFlow catalog?
SELECT COUNT(*) AS total_products
FROM products;

total_products
8

COUNT(column_name) counts only rows where that column is not NULL:

-- How many products have a description?
SELECT
  COUNT(*)           AS total_rows,
  COUNT(description) AS rows_with_description
FROM products;

total_rows	rows_with_description
8	8

In our dataset, all products have descriptions. But if any had NULL in that column, COUNT(description) would skip them while COUNT(*) would still count them. This is a subtle but important distinction.

SUM — What's the Total?

SUM adds up the values in a numeric column:

-- What's the total value of all inventory?
SELECT SUM(price * stock_quantity) AS total_inventory_value
FROM products;

total_inventory_value
18333.35

You can use expressions inside aggregate functions — here, price * stock_quantity computes the value per product before summing.

-- Total revenue from all order items
SELECT SUM(price_at_purchase * quantity) AS total_revenue
FROM order_items;

total_revenue
958.82

AVG — What's the Average?

-- Average product price
SELECT
  AVG(price)              AS avg_price,
  ROUND(AVG(price), 2)    AS avg_price_rounded
FROM products;

avg_price	avg_price_rounded
64.9862500000000000	64.99

AVG returns full precision by default. ROUND(value, decimal_places) trims it to something readable.

Important: AVG ignores NULL values — it averages only the rows where the column has a value. This is almost always what you want, but be aware of it.

MIN and MAX — The Extremes

-- Cheapest and most expensive products
SELECT
  MIN(price) AS cheapest,
  MAX(price) AS most_expensive
FROM products;

cheapest	most_expensive
19.99	129.99

MIN and MAX work on text columns too — they return the alphabetically first and last values:

SELECT
  MIN(name) AS first_alphabetically,
  MAX(name) AS last_alphabetically
FROM products;

first_alphabetically	last_alphabetically
Cable Management	Wireless Keyboard

Part 2: DISTINCT — Removing Duplicates

Before we get to GROUP BY, let's look at DISTINCT — a simpler tool for a common problem.

-- What categories exist in the product catalog?
SELECT DISTINCT category
FROM products;

category
Electronics
Accessories

Without DISTINCT, you'd get one row per product (8 rows, with "Electronics" and "Accessories" repeated). DISTINCT collapses duplicates so each unique value appears once.

DISTINCT can apply to multiple columns — in that case, it deduplicates on the combination:

-- What unique status values appear in orders?
SELECT DISTINCT status FROM orders;

status
shipped
pending
cancelled

COUNT(DISTINCT column) combines both — count unique values:

-- How many distinct categories are there?
SELECT COUNT(DISTINCT category) AS category_count
FROM products;

category_count
2

Part 3: GROUP BY — The Heart of Aggregation

DISTINCT shows what unique values exist. GROUP BY lets you compute something for each group. This is where aggregations become genuinely powerful.

The Mental Model: Split → Apply → Combine

When you write GROUP BY category, think of the database doing three steps:

Split the rows into groups — one group per unique value of category
Apply the aggregate function to each group separately
Combine the results into one row per group

Let's see it in action:

-- Count products per category
SELECT
  category,
  COUNT(*) AS product_count
FROM products
GROUP BY category;

category	product_count
Accessories	3
Electronics	5

The database split the 8 product rows into two groups (Electronics and Accessories), counted each group, and returned one summary row per group.

-- Average price per category
SELECT
  category,
  ROUND(AVG(price), 2) AS avg_price,
  MIN(price)           AS cheapest,
  MAX(price)           AS most_expensive
FROM products
GROUP BY category;

category	avg_price	cheapest	most_expensive
Accessories	33.32	19.99	49.99
Electronics	76.99	39.99	129.99

With one query, you've computed three different summary statistics for each product category.

The GROUP BY Rule: Every Column Must Be Grouped or Aggregated

Here's a rule PostgreSQL will enforce with an error if you break it:

Every column in the SELECT list must either appear in GROUP BY, or be wrapped in an aggregate function.

Why? Because GROUP BY collapses many rows into one. If you ask for a column that isn't grouped or aggregated, the database doesn't know which row's value to show for that group.

-- ❌ Error: "name" is not in GROUP BY and not aggregated
SELECT name, category, COUNT(*)
FROM products
GROUP BY category;

-- PostgreSQL error:
-- ERROR: column "products.name" must appear in the GROUP BY
-- clause or be used in an aggregate function

There are 5 Electronics products — if you asked for name without grouping by it, which of the 5 names should appear in the "Electronics" row? The database refuses to guess.

-- ✅ Include name in GROUP BY (now you get one row per category+name combination)
SELECT name, category, COUNT(*)
FROM products
GROUP BY category, name;

-- ✅ Or aggregate it (though MIN(name) is rarely useful in practice)
SELECT category, MIN(name) AS sample_name, COUNT(*)
FROM products
GROUP BY category;

Grouping by Multiple Columns

You can group by several columns — each unique combination becomes its own group:

-- Count orders by status, broken down by... (we'll need order data)
-- For now: count products by category and stock availability
SELECT
  category,
  CASE WHEN stock_quantity > 0 THEN 'in_stock' ELSE 'out_of_stock' END AS availability,
  COUNT(*) AS product_count
FROM products
GROUP BY category, availability
ORDER BY category, availability;

category	availability	product_count
Accessories	in_stock	2
Accessories	out_of_stock	1
Electronics	in_stock	4
Electronics	out_of_stock	1

Notice the CASE WHEN ... END expression — it creates a derived column on the fly. We can group by it just like any real column. We'll use CASE WHEN more in later articles; for now, read it as "if stock_quantity > 0, label it 'in_stock', otherwise 'out_of_stock'".

Part 4: HAVING — Filtering Groups

You already know WHERE filters rows. But what if you want to filter based on an aggregated value — like "only show me categories with more than 3 products"?

You can't use WHERE for this. Here's why, and what to use instead.

Why WHERE Doesn't Work on Aggregated Values

SQL processes clauses in a specific order:

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT

Notice that WHERE happens before GROUP BY. That means WHERE runs on individual rows before any grouping or aggregation has occurred. At WHERE time, there's no COUNT(*) yet — that number doesn't exist until after GROUP BY runs.

If you try to use an aggregate in WHERE, PostgreSQL tells you clearly:

-- ❌ Error: aggregate functions not allowed in WHERE
SELECT category, COUNT(*) AS product_count
FROM products
WHERE COUNT(*) > 3
GROUP BY category;

-- ERROR: aggregate functions are not allowed in WHERE

HAVING: WHERE for Groups

HAVING runs after GROUP BY — after the aggregation has happened. This is when you can filter on computed values:

-- ✅ Only show categories with more than 3 products
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > 3;

category	product_count
Electronics	5

Accessories (3 products) doesn't appear — it failed the HAVING condition.

-- Categories where average price exceeds $50
SELECT
  category,
  ROUND(AVG(price), 2) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 50;

category	avg_price
Electronics	76.99

WHERE and HAVING Together

They serve different purposes and can both appear in the same query:

-- Among in-stock products only,
-- find categories where average price exceeds $50
SELECT
  category,
  ROUND(AVG(price), 2) AS avg_price,
  COUNT(*) AS product_count
FROM products
WHERE stock_quantity > 0      -- filter individual rows FIRST
GROUP BY category
HAVING AVG(price) > 50        -- then filter groups
ORDER BY avg_price DESC;

Here's what happens step by step:

WHERE stock_quantity > 0 — removes out-of-stock products (Laptop Stand, Cable Management) from consideration
GROUP BY category — groups the remaining 6 rows by category
HAVING AVG(price) > 50 — keeps only groups where the average price of in-stock items exceeds $50
ORDER BY sorts the result

This is a meaningful business query: "among products you can actually sell, which categories have a high average price point?"

Part 5: Combining Aggregations with ORDER BY and LIMIT

Aggregate queries work naturally with ORDER BY and LIMIT — you just use the aggregate value in ORDER BY:

-- Top 3 most-stocked product categories by total units
SELECT
  category,
  SUM(stock_quantity) AS total_units
FROM products
GROUP BY category
ORDER BY total_units DESC
LIMIT 3;

category	total_units
Electronics	221
Accessories	75

You can reference the alias (total_units) in ORDER BY — PostgreSQL resolves it. Some databases don't allow alias references in ORDER BY, requiring you to repeat the expression: ORDER BY SUM(stock_quantity) DESC.

Part 6: Practical Aggregation Queries

Let's apply what we've learned to realistic ShopFlow business questions.

Order Summary by Status

-- How many orders are in each status?
SELECT
  status,
  COUNT(*)    AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;

status	order_count
shipped	3
pending	2
cancelled	1

Revenue from Order Items

-- Total revenue per order
SELECT
  order_id,
  SUM(price_at_purchase * quantity)  AS order_total,
  SUM(quantity)                      AS total_items
FROM order_items
GROUP BY order_id
ORDER BY order_total DESC;

order_id	order_total	total_items
6	119.97	3
1	169.97	3
5	139.98	2
3	94.98	2
2	59.99	1
4	129.99	1

Finding Orders Above a Threshold

-- Orders where the total exceeds $100
SELECT
  order_id,
  SUM(price_at_purchase * quantity) AS order_total
FROM order_items
GROUP BY order_id
HAVING SUM(price_at_purchase * quantity) > 100
ORDER BY order_total DESC;

order_id	order_total
1	169.97
5	139.98
4	129.99
6	119.97

Common Misconceptions

❌ Misconception: Use WHERE to filter on aggregated values

Reality: WHERE filters individual rows before grouping. Aggregate functions like COUNT(*) don't exist at WHERE time. Use HAVING for filtering on aggregate results.

Why this matters: The query will fail with an error — or worse, if you construct the condition in a way that doesn't use the aggregate, it silently returns wrong results.

Example:

-- ❌ Error: can't use COUNT(*) in WHERE
SELECT category, COUNT(*) FROM products
WHERE COUNT(*) > 2
GROUP BY category;

-- ✅ Correct: HAVING filters after grouping
SELECT category, COUNT(*) FROM products
GROUP BY category
HAVING COUNT(*) > 2;

❌ Misconception: COUNT(*) and COUNT(column) are the same

Reality: COUNT(*) counts every row. COUNT(column) skips rows where that column is NULL.

Why this matters: If you're counting an optional column (like description in products), you'll get a lower count than expected and might not notice.

Example:

-- Imagine some products have NULL descriptions
-- COUNT(*) = 8 (all rows)
-- COUNT(description) = 6 (only rows with non-NULL description)
SELECT COUNT(*), COUNT(description) FROM products;

❌ Misconception: You can SELECT non-grouped columns freely

Reality: PostgreSQL enforces the rule strictly — every SELECT column must be in GROUP BY or inside an aggregate. MySQL (with default settings) allows this but returns a random row's value, which is a silent bug.

Why this matters: If you're used to MySQL's lenient behavior, switching to PostgreSQL will surface these as errors. The PostgreSQL behavior is correct — MySQL's permissive behavior masks real ambiguity.

Troubleshooting Common Issues

Problem: `ERROR: column must appear in GROUP BY clause`

Symptoms: PostgreSQL refuses to run your query with this error.

Cause: You're selecting a column that isn't in GROUP BY and isn't inside an aggregate function.

Solution:

-- ❌ Causes error
SELECT name, category, COUNT(*) FROM products GROUP BY category;

-- ✅ Option 1: Add the column to GROUP BY
SELECT name, category, COUNT(*) FROM products GROUP BY category, name;

-- ✅ Option 2: Aggregate the column
SELECT MAX(name) AS sample_name, category, COUNT(*) FROM products GROUP BY category;

-- ✅ Option 3: Remove the column if you don't need it
SELECT category, COUNT(*) FROM products GROUP BY category;

Problem: `HAVING` condition produces unexpected results

Symptoms: Your HAVING filter seems too aggressive or too lenient.

Diagnostic steps:

-- Step 1: Run without HAVING to see all groups
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category;

-- Step 2: Add HAVING and compare
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
HAVING COUNT(*) > 2;

-- Step 3: Verify the threshold is what you meant
-- (is it > 2, or >= 2, or > 3?)

Problem: `AVG` returns NULL

Symptoms: AVG(column) returns NULL instead of a number.

Cause: The column has NULL values for every row in the group, or the table has no rows.

Solution:

-- Use COALESCE to provide a fallback value
SELECT COALESCE(AVG(price), 0) AS avg_price FROM products;

-- Or check if there are rows before aggregating
SELECT COUNT(*), AVG(price) FROM products;

Check Your Understanding

Quick Quiz

What's the difference between WHERE and HAVING?

Show Answer
WHERE filters individual rows before GROUP BY runs. HAVING filters groups after GROUP BY runs. Use WHERE for conditions on raw column values; use HAVING for conditions on aggregate values like COUNT(*), SUM(), or AVG().
What will this query return?
```
SELECT COUNT(*), COUNT(description) FROM products
WHERE category = 'Electronics';
```
Show Answer
Both return 5 in our dataset, because all 5 Electronics products have non-NULL descriptions. If any Electronics product had a NULL description, COUNT(description) would be lower than COUNT(*).
Why does this query fail?
```
SELECT category, name, COUNT(*)
FROM products
GROUP BY category;
```
Show Answer
name is in the SELECT list but not in GROUP BY and not wrapped in an aggregate function. PostgreSQL doesn't know which product's name to show for each category group. Fix by adding name to GROUP BY, removing it from SELECT, or wrapping it in an aggregate like MIN(name).

Hands-On Challenge

Task: Write a query that shows each order status with its order count and the percentage of total orders it represents. Sort by order count descending.

Expected result:

status	order_count	percentage
shipped	3	50.00
pending	2	33.33
cancelled	1	16.67

Show Solution

SELECT
  status,
  COUNT(*) AS order_count,
  ROUND(
    COUNT(*) * 100.0 / (SELECT COUNT(*) FROM orders),
    2
  ) AS percentage
FROM orders
GROUP BY status
ORDER BY order_count DESC;

Key techniques:

COUNT(*) * 100.0 / (SELECT COUNT(*) FROM orders) — the inner SELECT COUNT(*) FROM orders is a subquery that returns the total order count. Dividing by it gives a fraction; multiplying by 100.0 converts to a percentage. The .0 ensures decimal division instead of integer division.
ROUND(..., 2) — rounds to 2 decimal places
We'll cover subqueries in more depth in a later article; for now, know that a SELECT statement can be embedded inside another one

Summary: Key Takeaways

Aggregate functions collapse many rows into a single summary value: COUNT, SUM, AVG, MIN, MAX
COUNT(*) counts all rows; COUNT(column) skips NULL values
DISTINCT removes duplicate values before aggregating or displaying
GROUP BY splits rows into groups, applies the aggregate to each group, and returns one row per group
The GROUP BY rule: every column in SELECT must be in GROUP BY or inside an aggregate — PostgreSQL enforces this strictly
WHERE filters rows before grouping; HAVING filters groups after grouping
Use HAVING for any condition that involves an aggregate value
WHERE and HAVING can both appear in the same query — they serve different purposes

What's Next?

You can now summarize data from a single table using aggregate functions and grouping. But ShopFlow's most interesting questions involve multiple tables: "which users placed the most orders?" or "what's the total revenue broken down by product?"

The next step is SQL Joins: Combining Tables — where you'll learn to combine data from multiple tables in a single query using JOIN. The aggregation skills from this article pair directly with joins, letting you ask cross-table questions like "total revenue per user" with a single query.

Quick Reference​

Version Information​

What You Need to Know First​

What We'll Cover in This Article​

What We'll Explain Along the Way​

Part 1: Aggregate Functions — Collapsing Many Rows Into One​

COUNT — How Many Rows?​

SUM — What's the Total?​

AVG — What's the Average?​

MIN and MAX — The Extremes​

Part 2: DISTINCT — Removing Duplicates​

Part 3: GROUP BY — The Heart of Aggregation​

The Mental Model: Split → Apply → Combine​

The GROUP BY Rule: Every Column Must Be Grouped or Aggregated​

Grouping by Multiple Columns​

Part 4: HAVING — Filtering Groups​

Why WHERE Doesn't Work on Aggregated Values​

HAVING: WHERE for Groups​

WHERE and HAVING Together​

Part 5: Combining Aggregations with ORDER BY and LIMIT​

Part 6: Practical Aggregation Queries​

Order Summary by Status​

Revenue from Order Items​

Finding Orders Above a Threshold​

Common Misconceptions​

❌ Misconception: Use WHERE to filter on aggregated values​

❌ Misconception: COUNT(*) and COUNT(column) are the same​

❌ Misconception: You can SELECT non-grouped columns freely​

Troubleshooting Common Issues​

Problem: ERROR: column must appear in GROUP BY clause​

Problem: HAVING condition produces unexpected results​

Problem: AVG returns NULL​

Check Your Understanding​

Quick Quiz​

Hands-On Challenge​

Summary: Key Takeaways​

What's Next?​