SQL Joins: Combining Tables

You've been querying one table at a time. But the ShopFlow schema has four tables — and the most valuable information lives in the connections between them. "Which users placed orders?" "What products are in each order?" "What's the total revenue per customer?" None of these can be answered from a single table.

This is the moment the relational model pays off. Let's discover joins.

Quick Reference

INNER JOIN — rows that match in both tables:

SELECT users.name, orders.status
FROM orders
INNER JOIN users ON orders.user_id = users.id;

LEFT JOIN — all rows from the left table, matched rows from the right (unmatched = NULL):

SELECT users.name, orders.id AS order_id
FROM users
LEFT JOIN orders ON orders.user_id = users.id;

Alias tables to keep queries readable:

SELECT u.name, o.status
FROM orders o
INNER JOIN users u ON o.user_id = u.id;

Gotchas:

⚠️ A join without ON produces a cartesian product — every row × every row
⚠️ Ambiguous column names (e.g. id exists in both tables) must be qualified: users.id
⚠️ LEFT JOIN returns NULL for unmatched right-side columns — this is intentional and useful

Version Information

Tested with:

PostgreSQL: 15.x, 16.x
Node.js: v18.x, v20.x, v22.x

What You Need to Know First

Required reading:

Relational Databases: How Structured Data Works — primary keys, foreign keys, and the ShopFlow schema
SQL Basics: Reading Data with SELECT — WHERE, ORDER BY, LIMIT
SQL Aggregations: Summarizing Data — GROUP BY and HAVING are used in the later examples

What We'll Cover in This Article

By the end of this guide, you'll understand:

Why joins exist and what problem they solve
How INNER JOIN works and when to use it
How LEFT JOIN works and what NULL in the result means
Why RIGHT JOIN is almost never necessary
How self-joins work
The cartesian product — what it is and how to avoid it
How to handle ambiguous column names
How to chain multiple joins to span several tables

What We'll Explain Along the Way

We'll introduce these concepts with full explanations:

Table aliases — why FROM orders o is different from FROM orders
Row-level diagram of what a join actually does to the data
Why duplicate rows after a join is a diagnostic signal, not a mystery

Part 1: Why Joins Exist

In the schema overview, we split ShopFlow data across four tables. A user's name is in users. Their orders are in orders. The products in those orders are in order_items and products.

This split is called normalization — and it's intentional. By storing each fact in exactly one place, you avoid the mess of keeping duplicates in sync. When Aisha changes her email, you update one row in users. You don't have to find and update every order record that mentioned her email.

But there's a cost: to answer questions that span multiple tables, you need to reassemble the data at query time. That's what a join does — it temporarily combines rows from multiple tables based on a matching condition, so you can query across them as if they were one.

Part 2: The Cartesian Product — What Happens Without ON

Before learning joins properly, let's see what happens without them. This is the mistake that produces bafflingly large result sets.

If you list two tables in FROM without specifying how they relate, the database returns every possible combination of rows — one row in the result for every pairing of a row from table A with a row from table B:

-- ❌ No join condition — produces a cartesian product
SELECT users.name, orders.id
FROM users, orders;

With 5 users and 6 orders, this returns 5 × 6 = 30 rows. Every user is paired with every order, regardless of who placed what. Almost none of these pairings are meaningful.

This is called a cartesian product (or cross join). It's almost never what you want, and it's a common cause of "why does my query return thousands of rows?" confusion. The fix is always to add the join condition that specifies which rows should match.

Part 3: INNER JOIN — Rows That Match in Both Tables

INNER JOIN returns only the rows where the join condition is satisfied in both tables. Rows that don't have a match are excluded entirely.

-- Which users placed orders? Show their name and order status.
SELECT
  users.name,
  orders.id     AS order_id,
  orders.status
FROM orders
INNER JOIN users ON orders.user_id = users.id;

name	order_id	status
Aisha Mwangi	1	shipped
Aisha Mwangi	2	pending
Marco Rossi	3	shipped
Priya Sharma	4	cancelled
Priya Sharma	5	shipped
Lin Wei	6	pending

Let's break down what's happening:

FROM orders — start with the orders table
INNER JOIN users — bring in the users table
ON orders.user_id = users.id — match each order to the user whose id equals the order's user_id

Notice Carlos Rivera (user id 4) doesn't appear. He has no orders, so there's nothing in orders with user_id = 4. INNER JOIN excludes him because there's no match on the right side.

Visualizing the Match

Here's what the join condition does at the row level:

orders table                    users table
────────────                    ───────────
id  user_id  status             id  name
1   1        shipped            1   Aisha    ← match: orders.user_id=1 = users.id=1
2   1        pending            2   Marco
3   2        shipped     ───►   3   Priya
4   3        cancelled          4   Carlos
5   3        shipped            5   Lin
6   5        pending

Result rows (only matched pairs):
order 1 + user 1 (Aisha)  → row
order 2 + user 1 (Aisha)  → row
order 3 + user 2 (Marco)  → row
order 4 + user 3 (Priya)  → row
order 5 + user 3 (Priya)  → row
order 6 + user 5 (Lin)    → row

Diagram: Each order is matched to the user whose id equals the order's user_id. Orders with no matching user are excluded (none in this dataset). Users with no orders are excluded (Carlos).

Handling Ambiguous Column Names

Both users and orders have an id column. If you try to select just id, PostgreSQL throws an error:

-- ❌ Error: ambiguous column name
SELECT id, name, status
FROM orders
INNER JOIN users ON orders.user_id = users.id;

-- ERROR: column reference "id" is ambiguous

Always qualify ambiguous column names with their table:

-- ✅ Qualify with table name
SELECT orders.id, users.name, orders.status
FROM orders
INNER JOIN users ON orders.user_id = users.id;

Table Aliases — Keeping Queries Readable

Typing the full table name repeatedly gets verbose. Table aliases give tables a short nickname for the duration of the query:

-- Assign aliases: o for orders, u for users
SELECT o.id, u.name, o.status
FROM orders o
INNER JOIN users u ON o.user_id = u.id;

The alias is defined right after the table name in FROM and JOIN clauses. Once defined, use the alias everywhere — you can't mix the full name and the alias in the same query.

Single-letter aliases (o, u) are fine for short queries. For complex queries with many joins, slightly longer aliases (ord, usr) can improve readability.

Part 4: LEFT JOIN — Keeping All Left-Side Rows

INNER JOIN only returns matched rows. But sometimes you want all rows from one table, whether or not they have a match in the other. That's LEFT JOIN.

-- All users and their orders (including users with no orders)
SELECT
  u.name,
  o.id     AS order_id,
  o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
ORDER BY u.name;

name	order_id	status
Aisha Mwangi	1	shipped
Aisha Mwangi	2	pending
Carlos Rivera	NULL	NULL
Lin Wei	6	pending
Marco Rossi	3	shipped
Priya Sharma	4	cancelled
Priya Sharma	5	shipped

Carlos Rivera now appears — with NULL in the order_id and status columns because he has no orders. The NULL is the database's way of saying "there was no matching row on the right side."

Reading the NULL as Information

The NULL from a LEFT JOIN is meaningful. It's not an error — it's the answer to "does this user have any orders?" The answer for Carlos is no.

You can use this to find users with no orders:

-- Users who have never placed an order
SELECT u.name, u.email
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.id IS NULL;

name	email
Carlos Rivera	carlos@example.com

This pattern — LEFT JOIN followed by WHERE right_side_column IS NULL — is the standard SQL way to find rows in table A with no corresponding rows in table B.

LEFT vs INNER: Which Side Drives the Result?

Think of it this way:

INNER JOIN: the intersection — only rows that exist in both tables
LEFT JOIN: all of the left table, plus matching rows from the right (with NULL for gaps)

In FROM users LEFT JOIN orders, users is the "left" table. Every user appears in the result. Orders that don't match any user would be excluded — but in a well-designed schema with foreign key constraints, every order must have a valid user, so this case shouldn't arise.

Part 5: RIGHT JOIN — And Why You Rarely Need It

RIGHT JOIN is the mirror of LEFT JOIN — it keeps all rows from the right table and matches from the left, with NULL for unmatched left rows.

-- All orders and their users (keep all orders, even if user is missing)
SELECT u.name, o.id AS order_id, o.status
FROM users u
RIGHT JOIN orders o ON o.user_id = u.id;

This is identical in result to:

-- LEFT JOIN with tables swapped — exactly the same result
SELECT u.name, o.id AS order_id, o.status
FROM orders o
LEFT JOIN users u ON o.user_id = u.id;

RIGHT JOIN exists for completeness, but most developers always use LEFT JOIN and swap the table order when needed. It keeps mental overhead lower — you only need to reason about one join direction.

Part 6: Chaining Multiple Joins

The real power of joins shows up when you chain several together to span multiple tables. ShopFlow's most interesting queries require all four tables.

Let's build the order summary — every order with its user, the products ordered, and the total:

-- Full order summary: user → order → items → products
SELECT
  u.name                                        AS customer,
  o.id                                          AS order_id,
  o.status,
  p.name                                        AS product,
  oi.quantity,
  oi.price_at_purchase,
  oi.quantity * oi.price_at_purchase            AS line_total
FROM orders o
INNER JOIN users u      ON o.user_id    = u.id
INNER JOIN order_items oi ON oi.order_id  = o.id
INNER JOIN products p   ON oi.product_id = p.id
ORDER BY o.id, p.name;

customer	order_id	status	product	quantity	price_at_purchase	line_total
Aisha Mwangi	1	shipped	USB-C Hub	2	39.99	79.98
Aisha Mwangi	1	shipped	Wireless Keyboard	1	89.99	89.99
Aisha Mwangi	2	pending	Monitor Light	1	59.99	59.99
Marco Rossi	3	shipped	Desk Pad	1	29.99	29.99
Marco Rossi	3	shipped	Mechanical Mouse	1	64.99	64.99
Priya Sharma	4	cancelled	Webcam 4K	1	129.99	129.99
Priya Sharma	5	shipped	Laptop Stand	1	49.99	49.99
Priya Sharma	5	shipped	Wireless Keyboard	1	89.99	89.99
Lin Wei	6	pending	USB-C Hub	3	39.99	119.97

Each join adds one table to the working set. The database processes them left to right, matching rows at each step. The final result is the fully assembled picture.

Combining Joins with Aggregations

Now let's collapse this into a per-order total — combining everything from the last three articles:

-- Total revenue per customer, sorted highest first
SELECT
  u.name                                          AS customer,
  COUNT(DISTINCT o.id)                            AS order_count,
  SUM(oi.quantity * oi.price_at_purchase)         AS total_spent
FROM users u
LEFT JOIN orders o      ON o.user_id    = u.id
LEFT JOIN order_items oi ON oi.order_id  = o.id
GROUP BY u.id, u.name
ORDER BY total_spent DESC NULLS LAST;

customer	order_count	total_spent
Aisha Mwangi	2	229.96
Priya Sharma	2	269.97
Lin Wei	1	119.97
Marco Rossi	1	94.98
Carlos Rivera	0	NULL

We use LEFT JOIN here so Carlos Rivera appears with 0 orders and NULL spent — rather than being silently excluded by an INNER JOIN. Notice COUNT(DISTINCT o.id) — we use DISTINCT because without it, Carlos would show order_count = 0 but the raw COUNT would count the single NULL row from the left join.

Part 7: Self-Joins

A self-join joins a table to itself. This is useful when rows in a table have a relationship to other rows in the same table.

ShopFlow doesn't have a natural self-join use case, so here's a brief illustration with a hypothetical employees table where each employee has a manager_id that references another employee's id:

-- For illustration only — not in the ShopFlow schema
SELECT
  emp.name  AS employee,
  mgr.name  AS manager
FROM employees emp
LEFT JOIN employees mgr ON emp.manager_id = mgr.id;

The trick is aliasing the same table twice (emp and mgr) so you can reference it in two different roles in the same query. You'll encounter this pattern when working with hierarchical data — category trees, org charts, threaded comments.

Common Misconceptions

❌ Misconception: Joins multiply your data / something went wrong

Reality: Joins can produce more rows than either source table — and that's often correct. Each order item generates one row per product in the order. A customer with two orders and three items each legitimately produces six rows.

Why this matters: Seeing "more rows than expected" after a join is a diagnostic signal, not automatically a bug. The question to ask is: "Is each row a meaningful pairing of these two tables?"

Check for the actual bug — a missing ON condition:

-- ❌ Cartesian product: every user × every order = 30 rows
SELECT u.name, o.id FROM users u, orders o;

-- ✅ Correct join: only matching pairs = 6 rows
SELECT u.name, o.id FROM users u
INNER JOIN orders o ON o.user_id = u.id;

❌ Misconception: LEFT JOIN NULL means the data is missing or broken

Reality: NULL from a LEFT JOIN is intentional — it means there was no matching row on the right side. This is often the answer to a meaningful question ("which users have no orders?").

Why this matters: Filtering out NULL rows after a LEFT JOIN (with WHERE right_col IS NOT NULL) is equivalent to an INNER JOIN. Use LEFT JOIN deliberately when you want the unmatched rows.

❌ Misconception: You always need to join on the primary key

Reality: You join on whatever columns express the relationship. Almost always this is a foreign key to a primary key — but technically, you can join on any column with matching values. In practice, joining on non-key columns is rare and usually signals a schema design issue.

Troubleshooting Common Issues

Problem: Way more rows than expected

Symptoms: A query that should return ~10 rows returns thousands.

Most likely cause: Missing or incorrect ON condition, producing a partial or full cartesian product.

Diagnostic steps:

-- Step 1: Count rows from each table separately
SELECT COUNT(*) FROM orders;    -- 6
SELECT COUNT(*) FROM users;     -- 5

-- Step 2: Run the join without WHERE/GROUP BY
-- and check the row count
SELECT COUNT(*) FROM orders o
INNER JOIN users u ON o.user_id = u.id;
-- Should be 6 (one per order), not 30 (cartesian)

-- Step 3: If count is table_a × table_b, your ON condition is wrong or missing

Problem: `ERROR: column reference "id" is ambiguous`

Symptoms: PostgreSQL refuses to run a join query with an ambiguity error.

Cause: Multiple joined tables have a column with the same name, and your query doesn't specify which table's column you mean.

Solution:

-- ❌ Ambiguous
SELECT id, name FROM orders INNER JOIN users ON orders.user_id = users.id;

-- ✅ Qualify every ambiguous column
SELECT orders.id, users.name FROM orders
INNER JOIN users ON orders.user_id = users.id;

-- ✅ Use aliases for cleaner code
SELECT o.id, u.name FROM orders o
INNER JOIN users u ON o.user_id = u.id;

Problem: Expected rows missing from LEFT JOIN result

Symptoms: You used LEFT JOIN to keep all left-side rows, but some are still missing.

Common cause: A WHERE condition on the right-side table is filtering out unmatched rows (turning your LEFT JOIN into an effective INNER JOIN).

-- ❌ The WHERE clause filters out NULL rows from the LEFT JOIN
-- Carlos Rivera disappears again
SELECT u.name, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.status = 'shipped';  -- NULL != 'shipped', so Carlos is excluded

-- ✅ Move the condition into the ON clause if you still want left-side rows
SELECT u.name, o.status
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped';
-- Carlos appears with NULL status, other users show only their shipped orders

Check Your Understanding

Quick Quiz

What's the difference between INNER JOIN and LEFT JOIN?

Show Answer
INNER JOIN returns only rows where the join condition is satisfied in both tables — unmatched rows from either side are excluded. LEFT JOIN returns all rows from the left table, with matching data from the right table where it exists, and NULL for right-side columns where there's no match.
This query returns 30 rows instead of 6. Why?
```
SELECT users.name, orders.id
FROM users, orders;
```
Show Answer
No join condition was specified. The database produces a cartesian product — every user paired with every order. With 5 users and 6 orders: 5 × 6 = 30 rows. Fix by adding a proper join:
SELECT u.name, o.id FROM orders o INNER JOIN users u ON o.user_id = u.id;
How would you find all products that have never appeared in any order?
Show Answer
Use a LEFT JOIN from products to order_items, then filter for NULL on the right side:
SELECT p.name FROM products p LEFT JOIN order_items oi ON oi.product_id = p.id WHERE oi.id IS NULL;
This works because products with no orders have no matching order_items rows — the left join includes them with NULL in oi.id, and WHERE oi.id IS NULL isolates exactly those products.

Hands-On Challenge

Task: Write a query that returns each user's name, their total number of shipped orders, and their total amount spent on shipped orders only. Include users with no shipped orders (show 0 and NULL or 0.00 respectively). Sort by total spent descending.

Show Solution

SELECT
  u.name,
  COUNT(DISTINCT o.id)                        AS shipped_order_count,
  COALESCE(SUM(oi.quantity * oi.price_at_purchase), 0) AS total_spent
FROM users u
LEFT JOIN orders o
  ON o.user_id = u.id AND o.status = 'shipped'
LEFT JOIN order_items oi
  ON oi.order_id = o.id
GROUP BY u.id, u.name
ORDER BY total_spent DESC;

Key decisions:

LEFT JOIN on both joins ensures users with no shipped orders still appear
The AND o.status = 'shipped' condition goes in the ON clause (not WHERE) — otherwise it would exclude users with no shipped orders
COALESCE(..., 0) converts NULL total to 0 for cleaner display
GROUP BY u.id, u.name — always group by the primary key when grouping by a name, in case two users share a name

Summary: Key Takeaways

Joins temporarily combine rows from multiple tables at query time, based on a matching condition
A cartesian product (no ON condition) pairs every row in one table with every row in another — almost never what you want
INNER JOIN returns only rows with matches in both tables; unmatched rows are excluded
LEFT JOIN returns all rows from the left table plus matched right-side data; unmatched right-side columns are NULL
NULL from a LEFT JOIN is intentional — it means no match exists and is often the answer to "find rows with no related records"
Qualify ambiguous column names with the table name or alias when joining tables that share column names
Table aliases (FROM orders o) keep multi-join queries readable
Multiple joins can be chained to span any number of tables
A WHERE condition on the right-side table converts a LEFT JOIN into an effective INNER JOIN — use the ON clause instead when you want to preserve left-side rows

What's Next?

You can now read and combine data from any number of tables. But so far you've only been querying — not changing anything. Real applications also need to add new records, update existing ones, and remove obsolete data.

The next step is SQL Writes: INSERT, UPDATE, DELETE — where you'll learn to mutate ShopFlow data safely, understand why UPDATE without WHERE is one of the most dangerous commands in SQL, and get your first look at transactions as a safety net.

Quick Reference​

Version Information​

What You Need to Know First​

What We'll Cover in This Article​

What We'll Explain Along the Way​

Part 1: Why Joins Exist​

Part 2: The Cartesian Product — What Happens Without ON​

Part 3: INNER JOIN — Rows That Match in Both Tables​

Visualizing the Match​

Handling Ambiguous Column Names​

Table Aliases — Keeping Queries Readable​

Part 4: LEFT JOIN — Keeping All Left-Side Rows​

Reading the NULL as Information​

LEFT vs INNER: Which Side Drives the Result?​

Part 5: RIGHT JOIN — And Why You Rarely Need It​

Part 6: Chaining Multiple Joins​

Combining Joins with Aggregations​

Part 7: Self-Joins​

Common Misconceptions​

❌ Misconception: Joins multiply your data / something went wrong​

❌ Misconception: LEFT JOIN NULL means the data is missing or broken​

❌ Misconception: You always need to join on the primary key​

Troubleshooting Common Issues​

Problem: Way more rows than expected​

Problem: ERROR: column reference "id" is ambiguous​

Problem: Expected rows missing from LEFT JOIN result​

Check Your Understanding​

Quick Quiz​

Hands-On Challenge​

Summary: Key Takeaways​

What's Next?​