SQL Joins: Combining Tables
You've been querying one table at a time. But the ShopFlow schema has four tables — and the most valuable information lives in the connections between them. "Which users placed orders?" "What products are in each order?" "What's the total revenue per customer?" None of these can be answered from a single table.
This is the moment the relational model pays off. Let's discover joins.
Quick Reference
INNER JOIN — rows that match in both tables:
SELECT users.name, orders.status
FROM orders
INNER JOIN users ON orders.user_id = users.id;
LEFT JOIN — all rows from the left table, matched rows from the right (unmatched = NULL):
SELECT users.name, orders.id AS order_id
FROM users
LEFT JOIN orders ON orders.user_id = users.id;
Alias tables to keep queries readable:
SELECT u.name, o.status
FROM orders o
INNER JOIN users u ON o.user_id = u.id;
Gotchas:
- ⚠️ A join without
ONproduces a cartesian product — every row × every row - ⚠️ Ambiguous column names (e.g.
idexists in both tables) must be qualified:users.id - ⚠️
LEFT JOINreturnsNULLfor unmatched right-side columns — this is intentional and useful
Version Information
Tested with:
- PostgreSQL: 15.x, 16.x
- Node.js: v18.x, v20.x, v22.x
Last verified: May 2025
What You Need to Know First
Required reading:
- Relational Databases: How Structured Data Works — primary keys, foreign keys, and the ShopFlow schema
- SQL Basics: Reading Data with SELECT —
WHERE,ORDER BY,LIMIT - SQL Aggregations: Summarizing Data —
GROUP BYandHAVINGare used in the later examples
What We'll Cover in This Article
By the end of this guide, you'll understand:
- Why joins exist and what problem they solve
- How
INNER JOINworks and when to use it - How
LEFT JOINworks and whatNULLin the result means - Why
RIGHT JOINis almost never necessary - How self-joins work
- The cartesian product — what it is and how to avoid it
- How to handle ambiguous column names
- How to chain multiple joins to span several tables
What We'll Explain Along the Way
We'll introduce these concepts with full explanations:
- Table aliases — why
FROM orders ois different fromFROM orders - Row-level diagram of what a join actually does to the data
- Why duplicate rows after a join is a diagnostic signal, not a mystery
Part 1: Why Joins Exist
In the schema overview, we split ShopFlow data across four tables. A user's name is in users. Their orders are in orders. The products in those orders are in order_items and products.
This split is called normalization — and it's intentional. By storing each fact in exactly one place, you avoid the mess of keeping duplicates in sync. When Aisha changes her email, you update one row in users. You don't have to find and update every order record that mentioned her email.
But there's a cost: to answer questions that span multiple tables, you need to reassemble the data at query time. That's what a join does — it temporarily combines rows from multiple tables based on a matching condition, so you can query across them as if they were one.
Part 2: The Cartesian Product — What Happens Without ON
Before learning joins properly, let's see what happens without them. This is the mistake that produces bafflingly large result sets.
If you list two tables in FROM without specifying how they relate, the database returns every possible combination of rows — one row in the result for every pairing of a row from table A with a row from table B:
-- ❌ No join condition — produces a cartesian product
SELECT users.name, orders.id
FROM users, orders;
With 5 users and 6 orders, this returns 5 × 6 = 30 rows. Every user is paired with every order, regardless of who placed what. Almost none of these pairings are meaningful.
This is called a cartesian product (or cross join). It's almost never what you want, and it's a common cause of "why does my query return thousands of rows?" confusion. The fix is always to add the join condition that specifies which rows should match.
Part 3: INNER JOIN — Rows That Match in Both Tables
INNER JOIN returns only the rows where the join condition is satisfied in both tables. Rows that don't have a match are excluded entirely.
-- Which users placed orders? Show their name and order status.
SELECT
users.name,
orders.id AS order_id,
orders.status
FROM orders
INNER JOIN users ON orders.user_id = users.id;
| name | order_id | status |
|---|---|---|
| Aisha Mwangi | 1 | shipped |
| Aisha Mwangi | 2 | pending |
| Marco Rossi | 3 | shipped |
| Priya Sharma | 4 | cancelled |
| Priya Sharma | 5 | shipped |
| Lin Wei | 6 | pending |
Let's break down what's happening:
FROM orders— start with the orders tableINNER JOIN users— bring in the users tableON orders.user_id = users.id— match each order to the user whoseidequals the order'suser_id
Notice Carlos Rivera (user id 4) doesn't appear. He has no orders, so there's nothing in orders with user_id = 4. INNER JOIN excludes him because there's no match on the right side.
Visualizing the Match
Here's what the join condition does at the row level:
orders table users table
──────────── ───────────
id user_id status id name
1 1 shipped 1 Aisha ← match: orders.user_id=1 = users.id=1
2 1 pending 2 Marco
3 2 shipped ───► 3 Priya
4 3 cancelled 4 Carlos
5 3 shipped 5 Lin
6 5 pending
Result rows (only matched pairs):
order 1 + user 1 (Aisha) → row
order 2 + user 1 (Aisha) → row
order 3 + user 2 (Marco) → row
order 4 + user 3 (Priya) → row
order 5 + user 3 (Priya) → row
order 6 + user 5 (Lin) → row
Diagram: Each order is matched to the user whose id equals the order's user_id. Orders with no matching user are excluded (none in this dataset). Users with no orders are excluded (Carlos).
Handling Ambiguous Column Names
Both users and orders have an id column. If you try to select just id, PostgreSQL throws an error:
-- ❌ Error: ambiguous column name
SELECT id, name, status
FROM orders
INNER JOIN users ON orders.user_id = users.id;
-- ERROR: column reference "id" is ambiguous
Always qualify ambiguous column names with their table:
-- ✅ Qualify with table name
SELECT orders.id, users.name, orders.status
FROM orders
INNER JOIN users ON orders.user_id = users.id;
Table Aliases — Keeping Queries Readable
Typing the full table name repeatedly gets verbose. Table aliases give tables a short nickname for the duration of the query:
-- Assign aliases: o for orders, u for users
SELECT o.id, u.name, o.status
FROM orders o
INNER JOIN users u ON o.user_id = u.id;
The alias is defined right after the table name in FROM and JOIN clauses. Once defined, use the alias everywhere — you can't mix the full name and the alias in the same query.
Single-letter aliases (o, u) are fine for short queries. For complex queries with many joins, slightly longer aliases (ord, usr) can improve readability.
Part 4: LEFT JOIN — Keeping All Left-Side Rows
INNER JOIN only returns matched rows. But sometimes you want all rows from one table, whether or not they have a match in the other. That's LEFT JOIN.
-- All users and their orders (including users with no orders)
SELECT
u.name,
o.id AS order_id,
o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
ORDER BY u.name;
| name | order_id | status |
|---|---|---|
| Aisha Mwangi | 1 | shipped |
| Aisha Mwangi | 2 | pending |
| Carlos Rivera | NULL | NULL |
| Lin Wei | 6 | pending |
| Marco Rossi | 3 | shipped |
| Priya Sharma | 4 | cancelled |
| Priya Sharma | 5 | shipped |
Carlos Rivera now appears — with NULL in the order_id and status columns because he has no orders. The NULL is the database's way of saying "there was no matching row on the right side."
Reading the NULL as Information
The NULL from a LEFT JOIN is meaningful. It's not an error — it's the answer to "does this user have any orders?" The answer for Carlos is no.
You can use this to find users with no orders:
-- Users who have never placed an order
SELECT u.name, u.email
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;
| name | |
|---|---|
| Carlos Rivera | carlos@example.com |
This pattern — LEFT JOIN followed by WHERE right_side_column IS NULL — is the standard SQL way to find rows in table A with no corresponding rows in table B.
LEFT vs INNER: Which Side Drives the Result?
Think of it this way:
- INNER JOIN: the intersection — only rows that exist in both tables
- LEFT JOIN: all of the left table, plus matching rows from the right (with
NULLfor gaps)
In FROM users LEFT JOIN orders, users is the "left" table. Every user appears in the result. Orders that don't match any user would be excluded — but in a well-designed schema with foreign key constraints, every order must have a valid user, so this case shouldn't arise.
Part 5: RIGHT JOIN — And Why You Rarely Need It
RIGHT JOIN is the mirror of LEFT JOIN — it keeps all rows from the right table and matches from the left, with NULL for unmatched left rows.
-- All orders and their users (keep all orders, even if user is missing)
SELECT u.name, o.id AS order_id, o.status
FROM users u
RIGHT JOIN orders o ON o.user_id = u.id;
This is identical in result to:
-- LEFT JOIN with tables swapped — exactly the same result
SELECT u.name, o.id AS order_id, o.status
FROM orders o
LEFT JOIN users u ON o.user_id = u.id;
RIGHT JOIN exists for completeness, but most developers always use LEFT JOIN and swap the table order when needed. It keeps mental overhead lower — you only need to reason about one join direction.
Part 6: Chaining Multiple Joins
The real power of joins shows up when you chain several together to span multiple tables. ShopFlow's most interesting queries require all four tables.
Let's build the order summary — every order with its user, the products ordered, and the total:
-- Full order summary: user → order → items → products
SELECT
u.name AS customer,
o.id AS order_id,
o.status,
p.name AS product,
oi.quantity,
oi.price_at_purchase,
oi.quantity * oi.price_at_purchase AS line_total
FROM orders o
INNER JOIN users u ON o.user_id = u.id
INNER JOIN order_items oi ON oi.order_id = o.id
INNER JOIN products p ON oi.product_id = p.id
ORDER BY o.id, p.name;
| customer | order_id | status | product | quantity | price_at_purchase | line_total |
|---|---|---|---|---|---|---|
| Aisha Mwangi | 1 | shipped | USB-C Hub | 2 | 39.99 | 79.98 |
| Aisha Mwangi | 1 | shipped | Wireless Keyboard | 1 | 89.99 | 89.99 |
| Aisha Mwangi | 2 | pending | Monitor Light | 1 | 59.99 | 59.99 |
| Marco Rossi | 3 | shipped | Desk Pad | 1 | 29.99 | 29.99 |
| Marco Rossi | 3 | shipped | Mechanical Mouse | 1 | 64.99 | 64.99 |
| Priya Sharma | 4 | cancelled | Webcam 4K | 1 | 129.99 | 129.99 |
| Priya Sharma | 5 | shipped | Laptop Stand | 1 | 49.99 | 49.99 |
| Priya Sharma | 5 | shipped | Wireless Keyboard | 1 | 89.99 | 89.99 |
| Lin Wei | 6 | pending | USB-C Hub | 3 | 39.99 | 119.97 |
Each join adds one table to the working set. The database processes them left to right, matching rows at each step. The final result is the fully assembled picture.
Combining Joins with Aggregations
Now let's collapse this into a per-order total — combining everything from the last three articles:
-- Total revenue per customer, sorted highest first
SELECT
u.name AS customer,
COUNT(DISTINCT o.id) AS order_count,
SUM(oi.quantity * oi.price_at_purchase) AS total_spent
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY u.id, u.name
ORDER BY total_spent DESC NULLS LAST;
| customer | order_count | total_spent |
|---|---|---|
| Aisha Mwangi | 2 | 229.96 |
| Priya Sharma | 2 | 269.97 |
| Lin Wei | 1 | 119.97 |
| Marco Rossi | 1 | 94.98 |
| Carlos Rivera | 0 | NULL |
We use LEFT JOIN here so Carlos Rivera appears with 0 orders and NULL spent — rather than being silently excluded by an INNER JOIN. Notice COUNT(DISTINCT o.id) — we use DISTINCT because without it, Carlos would show order_count = 0 but the raw COUNT would count the single NULL row from the left join.
Part 7: Self-Joins
A self-join joins a table to itself. This is useful when rows in a table have a relationship to other rows in the same table.
ShopFlow doesn't have a natural self-join use case, so here's a brief illustration with a hypothetical employees table where each employee has a manager_id that references another employee's id:
-- For illustration only — not in the ShopFlow schema
SELECT
emp.name AS employee,
mgr.name AS manager
FROM employees emp
LEFT JOIN employees mgr ON emp.manager_id = mgr.id;
The trick is aliasing the same table twice (emp and mgr) so you can reference it in two different roles in the same query. You'll encounter this pattern when working with hierarchical data — category trees, org charts, threaded comments.
Common Misconceptions
❌ Misconception: Joins multiply your data / something went wrong
Reality: Joins can produce more rows than either source table — and that's often correct. Each order item generates one row per product in the order. A customer with two orders and three items each legitimately produces six rows.
Why this matters: Seeing "more rows than expected" after a join is a diagnostic signal, not automatically a bug. The question to ask is: "Is each row a meaningful pairing of these two tables?"
Check for the actual bug — a missing ON condition:
-- ❌ Cartesian product: every user × every order = 30 rows
SELECT u.name, o.id FROM users u, orders o;
-- ✅ Correct join: only matching pairs = 6 rows
SELECT u.name, o.id FROM users u
INNER JOIN orders o ON o.user_id = u.id;
❌ Misconception: LEFT JOIN NULL means the data is missing or broken
Reality: NULL from a LEFT JOIN is intentional — it means there was no matching row on the right side. This is often the answer to a meaningful question ("which users have no orders?").
Why this matters: Filtering out NULL rows after a LEFT JOIN (with WHERE right_col IS NOT NULL) is equivalent to an INNER JOIN. Use LEFT JOIN deliberately when you want the unmatched rows.
❌ Misconception: You always need to join on the primary key
Reality: You join on whatever columns express the relationship. Almost always this is a foreign key to a primary key — but technically, you can join on any column with matching values. In practice, joining on non-key columns is rare and usually signals a schema design issue.
Troubleshooting Common Issues
Problem: Way more rows than expected
Symptoms: A query that should return ~10 rows returns thousands.
Most likely cause: Missing or incorrect ON condition, producing a partial or full cartesian product.
Diagnostic steps:
-- Step 1: Count rows from each table separately
SELECT COUNT(*) FROM orders; -- 6
SELECT COUNT(*) FROM users; -- 5
-- Step 2: Run the join without WHERE/GROUP BY
-- and check the row count
SELECT COUNT(*) FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- Should be 6 (one per order), not 30 (cartesian)
-- Step 3: If count is table_a × table_b, your ON condition is wrong or missing
Problem: ERROR: column reference "id" is ambiguous
Symptoms: PostgreSQL refuses to run a join query with an ambiguity error.
Cause: Multiple joined tables have a column with the same name, and your query doesn't specify which table's column you mean.
Solution:
-- ❌ Ambiguous
SELECT id, name FROM orders INNER JOIN users ON orders.user_id = users.id;
-- ✅ Qualify every ambiguous column
SELECT orders.id, users.name FROM orders
INNER JOIN users ON orders.user_id = users.id;
-- ✅ Use aliases for cleaner code
SELECT o.id, u.name FROM orders o
INNER JOIN users u ON o.user_id = u.id;
Problem: Expected rows missing from LEFT JOIN result
Symptoms: You used LEFT JOIN to keep all left-side rows, but some are still missing.
Common cause: A WHERE condition on the right-side table is filtering out unmatched rows (turning your LEFT JOIN into an effective INNER JOIN).
-- ❌ The WHERE clause filters out NULL rows from the LEFT JOIN
-- Carlos Rivera disappears again
SELECT u.name, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.status = 'shipped'; -- NULL != 'shipped', so Carlos is excluded
-- ✅ Move the condition into the ON clause if you still want left-side rows
SELECT u.name, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped';
-- Carlos appears with NULL status, other users show only their shipped orders
Check Your Understanding
Quick Quiz
-
What's the difference between INNER JOIN and LEFT JOIN?
Show Answer
INNER JOINreturns only rows where the join condition is satisfied in both tables — unmatched rows from either side are excluded.LEFT JOINreturns all rows from the left table, with matching data from the right table where it exists, andNULLfor right-side columns where there's no match. -
This query returns 30 rows instead of 6. Why?
SELECT users.name, orders.id
FROM users, orders;Show Answer
No join condition was specified. The database produces a cartesian product — every user paired with every order. With 5 users and 6 orders: 5 × 6 = 30 rows. Fix by adding a proper join:
SELECT u.name, o.id
FROM orders o
INNER JOIN users u ON o.user_id = u.id; -
How would you find all products that have never appeared in any order?
Show Answer
Use a
LEFT JOINfromproductstoorder_items, then filter forNULLon the right side:SELECT p.name
FROM products p
LEFT JOIN order_items oi ON oi.product_id = p.id
WHERE oi.id IS NULL;This works because products with no orders have no matching
order_itemsrows — the left join includes them withNULLinoi.id, andWHERE oi.id IS NULLisolates exactly those products.
Hands-On Challenge
Task: Write a query that returns each user's name, their total number of shipped orders, and their total amount spent on shipped orders only. Include users with no shipped orders (show 0 and NULL or 0.00 respectively). Sort by total spent descending.
Show Solution
SELECT
u.name,
COUNT(DISTINCT o.id) AS shipped_order_count,
COALESCE(SUM(oi.quantity * oi.price_at_purchase), 0) AS total_spent
FROM users u
LEFT JOIN orders o
ON o.user_id = u.id AND o.status = 'shipped'
LEFT JOIN order_items oi
ON oi.order_id = o.id
GROUP BY u.id, u.name
ORDER BY total_spent DESC;
Key decisions:
LEFT JOINon both joins ensures users with no shipped orders still appear- The
AND o.status = 'shipped'condition goes in theONclause (notWHERE) — otherwise it would exclude users with no shipped orders COALESCE(..., 0)convertsNULLtotal to0for cleaner displayGROUP BY u.id, u.name— always group by the primary key when grouping by a name, in case two users share a name
Summary: Key Takeaways
- Joins temporarily combine rows from multiple tables at query time, based on a matching condition
- A cartesian product (no
ONcondition) pairs every row in one table with every row in another — almost never what you want INNER JOINreturns only rows with matches in both tables; unmatched rows are excludedLEFT JOINreturns all rows from the left table plus matched right-side data; unmatched right-side columns areNULLNULLfrom aLEFT JOINis intentional — it means no match exists and is often the answer to "find rows with no related records"- Qualify ambiguous column names with the table name or alias when joining tables that share column names
- Table aliases (
FROM orders o) keep multi-join queries readable - Multiple joins can be chained to span any number of tables
- A
WHEREcondition on the right-side table converts aLEFT JOINinto an effectiveINNER JOIN— use theONclause instead when you want to preserve left-side rows
What's Next?
You can now read and combine data from any number of tables. But so far you've only been querying — not changing anything. Real applications also need to add new records, update existing ones, and remove obsolete data.
The next step is SQL Writes: INSERT, UPDATE, DELETE — where you'll learn to mutate ShopFlow data safely, understand why UPDATE without WHERE is one of the most dangerous commands in SQL, and get your first look at transactions as a safety net.