Skip to main content

Indexes: Making Queries Fast

You've written queries. You've designed a schema. For a database with a few hundred rows, everything feels instant. Then the ShopFlow catalog grows to 100,000 products, orders reach a million rows, and that same WHERE category = 'Electronics' query suddenly takes three seconds.

The fix is usually an index. But indexes aren't magic — they trade write overhead for read speed, and adding the wrong ones can slow your database down instead of speeding it up. Let's understand how they actually work.


Quick Reference

Create an index:

-- Single column
CREATE INDEX idx_products_category ON products(category);

-- Composite (column order matters)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Partial index (only indexes rows matching a condition)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';

Check if your index is being used:

EXPLAIN ANALYZE
SELECT * FROM products WHERE category = 'Electronics';

Gotchas:

  • ⚠️ Every index slows down INSERT, UPDATE, and DELETE — don't index everything
  • ⚠️ In composite indexes, column order matters: (user_id, status) helps WHERE user_id = 1 but (status, user_id) does not
  • ⚠️ PostgreSQL may ignore your index if it estimates a sequential scan is faster (small tables, low selectivity)
  • ⚠️ Functions on indexed columns defeat the index: WHERE LOWER(email) = 'x' won't use an index on email

Version Information

Tested with:

  • PostgreSQL: 15.x, 16.x
  • Node.js: v18.x, v20.x, v22.x

Last verified: May 2025


What You Need to Know First

Required reading:

What We'll Cover in This Article

By the end of this guide, you'll understand:

  • What an index is and how B-tree indexes work internally (simplified)
  • How PostgreSQL decides between an index scan and a sequential scan
  • How to create single-column, composite, unique, and partial indexes
  • How to read EXPLAIN ANALYZE output
  • When indexes help and when they hurt
  • How to index the ShopFlow queries you've already written

What We'll Explain Along the Way

We'll introduce these concepts with full explanations:

  • The query planner — PostgreSQL's internal optimizer
  • Index selectivity — why some indexes are ignored
  • Write amplification — the cost every index adds to mutations
  • Dead tuples and VACUUM (briefly)

Part 1: What an Index Actually Is

A database index is the same idea as a book index. In a 500-page book, if you want every page that mentions "foreign key," you don't read every page — you check the index in the back, which maps the term to specific page numbers and takes you directly there.

A database index works the same way. Without an index on products.category, a query like WHERE category = 'Electronics' requires a sequential scan — reading every single row in the table to find the ones that match. With 100,000 products, that's 100,000 row reads.

With an index on category, the database maintains a separate, sorted data structure that maps category values to the rows that have them. The query becomes a fast lookup into that structure, followed by direct retrieval of only the matching rows.

B-Tree: The Default Index Type

PostgreSQL's default index type is a B-tree (balanced tree). You don't need to understand the implementation deeply, but the structure gives you an intuition for what queries it helps:

B-Tree for products.category:
[Furniture]
/ \
[Electronics] [Furniture]
/ \
[Accessories] [Electronics]

A B-tree keeps values sorted. This means it efficiently supports:

  • Exact matches: WHERE category = 'Electronics'
  • Range queries: WHERE price BETWEEN 40 AND 80
  • Starts-with searches: WHERE name LIKE 'Wireless%'
  • Sorting: ORDER BY price (data is already sorted in the index)

It does not help with:

  • Contains searches: WHERE name LIKE '%Keyboard%' (can't start in the middle of a sorted tree)
  • Function results: WHERE LOWER(email) = 'aisha@example.com' (the index stores email, not LOWER(email))

Part 2: The Query Planner — PostgreSQL Decides

You don't instruct PostgreSQL to use an index. You create the index, and the query planner (also called the optimizer) decides whether using it is faster than a sequential scan.

This decision is based on statistics that PostgreSQL maintains about each table:

  • How many rows are in the table
  • How many distinct values exist in each column
  • How evenly values are distributed

From these statistics, the planner estimates the selectivity of a query — what fraction of rows the condition will match. High selectivity (few matching rows) → index scan likely wins. Low selectivity (many matching rows) → sequential scan might win.

Here's a concrete example. If your products table has 100,000 rows and only 8 rows have category = 'Electronics', the index dramatically narrows the search. But if 90,000 of 100,000 rows have category = 'Electronics', the index would send the database to almost every row anyway — a sequential scan of the table is cheaper because it avoids the overhead of following index pointers.

This is why adding an index on a low-cardinality column (one with few distinct values, like a boolean) rarely helps — and sometimes slows things down.


Part 3: EXPLAIN ANALYZE — Seeing the Query Plan

EXPLAIN ANALYZE shows you exactly how PostgreSQL is executing a query. It's the most important diagnostic tool for query performance.

First, let's create an index to observe the difference:

-- Start without an index on category
EXPLAIN ANALYZE
SELECT name, price FROM products WHERE category = 'Electronics';

With a small table (our 8-row seed data), you'll see something like:

Seq Scan on products  (cost=0.00..1.08 rows=5 width=22)
(actual time=0.012..0.016 rows=5 loops=1)
Filter: (category = 'Electronics')
Rows Removed by Filter: 3
Planning Time: 0.118 ms
Execution Time: 0.031 ms

Now create an index and run again:

CREATE INDEX idx_products_category ON products(category);

EXPLAIN ANALYZE
SELECT name, price FROM products WHERE category = 'Electronics';

With 8 rows, PostgreSQL will likely still use a sequential scan — the table is tiny and there's no point using the index. This is the planner working correctly.

To see the index used, you need to either add significantly more data, or temporarily disable sequential scans for testing:

-- Force index usage for testing (don't do this in production)
SET enable_seqscan = OFF;

EXPLAIN ANALYZE
SELECT name, price FROM products WHERE category = 'Electronics';
Index Scan using idx_products_category on products
(cost=0.14..8.16 rows=5 width=22)
(actual time=0.022..0.029 rows=5 loops=1)
Index Cond: ((category)::text = 'Electronics'::text)
Planning Time: 0.213 ms
Execution Time: 0.047 ms

Reading EXPLAIN ANALYZE Output

The key parts to understand:

Seq Scan on products         ← Scan type (Seq Scan vs Index Scan vs Bitmap Index Scan)
(cost=0.00..1.08 ← Estimated cost: startup..total
rows=5 ← Estimated rows returned
width=22) ← Estimated average row width in bytes
(actual time=0.012..0.016 ← Actual time: startup..total (milliseconds)
rows=5 ← Actual rows returned
loops=1) ← How many times this node ran
Filter: (category = ...) ← The condition applied
Rows Removed by Filter: 3 ← Rows read but discarded

What to look for:

  • Seq Scan on a large table with a highly selective WHERE → missing index
  • Large difference between estimated rows and actual rows → stale statistics, run ANALYZE
  • High Rows Removed by Filter → the scan is reading many rows it doesn't need
  • Nested Loop joining large tables → might need a better index on the join column

Part 4: Creating the Right Indexes

Single-Column Indexes

The most common type. Index the column that appears in WHERE, JOIN ON, or ORDER BY:

-- Speed up filtering by category
CREATE INDEX idx_products_category ON products(category);

-- Speed up FK lookups (covered in Article 6 — adding them here for completeness)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Speed up ordering and filtering by date
CREATE INDEX idx_orders_created_at ON orders(created_at);

Composite Indexes — Column Order Matters

A composite index covers multiple columns. The critical rule: the index can only be used by queries that filter on the leftmost columns first.

-- Index: (user_id, status)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

This index helps:

-- ✅ Uses index: filters on user_id (leftmost column)
SELECT * FROM orders WHERE user_id = 1;

-- ✅ Uses index: filters on both columns
SELECT * FROM orders WHERE user_id = 1 AND status = 'shipped';

This index does NOT help:

-- ❌ Cannot use index: skips the leftmost column
SELECT * FROM orders WHERE status = 'shipped';

Think of a composite index like a phone book sorted by (last name, first name). You can look up "Smith" quickly, or "Smith, John" specifically. But you can't efficiently find all "Johns" without scanning every entry.

Column order strategy: Put the most selective column (fewest distinct values relative to total rows — wait, actually the opposite: most values, highest cardinality) first. Put columns used in equality conditions (=) before columns used in range conditions (>, <, BETWEEN).

Unique Indexes

CREATE UNIQUE INDEX enforces uniqueness and provides fast lookups:

-- Already created implicitly by UNIQUE constraint, but this is the explicit form
CREATE UNIQUE INDEX idx_users_email ON users(email);

Note: UNIQUE constraints in CREATE TABLE automatically create a unique index behind the scenes. You don't need to create both.

Partial Indexes — Index Only What You Query

A partial index only indexes rows matching a condition. The index is smaller, faster to maintain, and sometimes faster for the targeted query:

-- Only index pending orders (the ones actively needing processing)
-- Completed orders don't need to be in this index
CREATE INDEX idx_orders_pending_created
ON orders(created_at)
WHERE status = 'pending';

This index is used only when the query includes WHERE status = 'pending':

-- ✅ Uses the partial index
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;

-- ❌ Does not use this partial index (wrong status)
SELECT * FROM orders WHERE status = 'shipped' ORDER BY created_at;

Partial indexes shine when a small subset of rows is queried frequently — like active records, pending items, or flagged rows.

Expression Indexes — Index the Transformed Value

If you frequently query using a function on a column, index the function's output:

-- Query that a regular index on email WON'T speed up
SELECT * FROM users WHERE LOWER(email) = 'aisha@example.com';

-- Create an index on the expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now this query uses the index
SELECT * FROM users WHERE LOWER(email) = 'aisha@example.com';

Part 5: The ShopFlow Index Strategy

Let's audit the queries we've written across this module and create the right indexes for each.

Indexes for ShopFlow

-- ─────────────────────────────────────────────────────
-- USERS
-- ─────────────────────────────────────────────────────

-- Email lookups (login, duplicate checks)
-- Already created by UNIQUE constraint on users.email
-- No additional index needed.

-- Case-insensitive email search (if your app needs it)
CREATE INDEX idx_users_email_lower ON users(LOWER(email));


-- ─────────────────────────────────────────────────────
-- PRODUCTS
-- ─────────────────────────────────────────────────────

-- Filter by category (common storefront query)
CREATE INDEX idx_products_category ON products(category);

-- Filter by category + price range (e.g., "Electronics under $100")
CREATE INDEX idx_products_category_price ON products(category, price);

-- Find in-stock products (high-frequency query)
CREATE INDEX idx_products_in_stock ON products(category)
WHERE stock_quantity > 0;

-- Text search on product name (basic prefix search)
CREATE INDEX idx_products_name ON products(name text_pattern_ops);
-- text_pattern_ops enables LIKE 'prefix%' queries on this index


-- ─────────────────────────────────────────────────────
-- ORDERS
-- ─────────────────────────────────────────────────────

-- FK lookup: all orders for a user
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Filter by status (dashboard, processing queue)
CREATE INDEX idx_orders_status ON orders(status);

-- Combined: orders for a user in a specific status
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Recent orders (date-filtered queries)
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

-- Active order processing (partial index — only pending orders)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';


-- ─────────────────────────────────────────────────────
-- ORDER ITEMS
-- ─────────────────────────────────────────────────────

-- FK lookup: all items for an order
CREATE INDEX idx_order_items_order_id ON order_items(order_id);

-- FK lookup: all orders containing a product
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

Part 6: When Indexes Hurt

Every index has a cost. Every time a row is inserted, updated, or deleted, PostgreSQL must update every index that covers the affected columns. In a write-heavy table, too many indexes can slow writes more than they help reads.

Signs you have too many indexes:

  • INSERT performance is noticeably slow on a table
  • pg_stat_user_indexes shows indexes with zero or near-zero scans
  • The table has more indexes than it has columns

Finding unused indexes:

-- Check which indexes are actually being used
SELECT
schemaname,
tablename,
indexname,
idx_scan AS times_used,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

An index with idx_scan = 0 after weeks of production traffic is probably safe to drop.

Dropping an index:

DROP INDEX idx_products_category;

-- Drop without error if it doesn't exist
DROP INDEX IF EXISTS idx_products_category;

The Right Mental Model

Add indexes in response to observed slow queries, not preemptively for every column. The workflow:

  1. Measure: identify a slow query using EXPLAIN ANALYZE or application monitoring
  2. Understand: find the scan type and what condition is being filtered
  3. Index: create a targeted index for that specific query pattern
  4. Verify: run EXPLAIN ANALYZE again to confirm the index is used and the query is faster
  5. Monitor: watch pg_stat_user_indexes over time to confirm the index is earning its keep

Common Misconceptions

❌ Misconception: More indexes = faster database

Reality: Each index adds overhead to every write. A table with 15 indexes is slower to insert into than one with 3 indexes. And the query planner has to evaluate more options, which slightly increases planning time.

Fix: Index based on actual slow queries. Prefer fewer, well-chosen indexes over many speculative ones.

❌ Misconception: My index exists, so it's being used

Reality: PostgreSQL may choose a sequential scan even when a relevant index exists — if the table is small, the condition has low selectivity, or the planner's statistics suggest it's faster.

Verification:

EXPLAIN ANALYZE SELECT ... -- Always verify with EXPLAIN ANALYZE

❌ Misconception: An index on (a, b) is the same as separate indexes on a and b

Reality: A composite index (user_id, status) is different from two separate indexes on user_id and status.

  • (user_id, status) is optimally fast for queries that filter by both columns and for queries that filter by user_id alone
  • Two separate indexes let the planner use one, the other, or both (via bitmap index scan) — but the composite index is faster when both columns are used together

Use composite indexes when you frequently query by multiple columns together. Keep separate indexes when columns are more often queried independently.


Troubleshooting Common Issues

Problem: Query is slow despite having an index

Symptoms: EXPLAIN ANALYZE shows Seq Scan even though you created an index.

Common causes:

  1. Table is too small — PostgreSQL chose sequential scan because it's cheaper
  2. Low selectivity — too many rows match the condition
  3. Stale statistics — run ANALYZE table_name to update
  4. Function on the indexed column defeating the index

Diagnostic:

-- Update statistics first
ANALYZE products;

-- Then check the plan
EXPLAIN ANALYZE SELECT * FROM products WHERE category = 'Electronics';

-- If still using SeqScan on a large table, check selectivity:
SELECT category, COUNT(*) FROM products GROUP BY category;
-- If 80% of rows are 'Electronics', an index won't help much

Problem: Index is listed in pg_stat_user_indexes with 0 scans

Symptoms: The index exists but has never been used.

Cause: Either the index doesn't match any actual query patterns, or the table is small enough that PostgreSQL always chooses a sequential scan.

Solution: Check if the index is needed. If it's covering a query you intended to optimize, verify the query's WHERE clause matches the index columns exactly (including no functions on indexed columns).


Check Your Understanding

Quick Quiz

  1. Why might PostgreSQL choose a sequential scan even when an index exists?

    Show Answer

    PostgreSQL uses the query planner to estimate whether an index scan or sequential scan is cheaper. If the table is small, the condition matches most rows (low selectivity), or the statistics are stale, the planner may determine a sequential scan is faster. It also skips the index overhead of pointer-following. The planner's decision is based on cost estimation, not just index availability.

  2. You have a composite index on (user_id, status). Which of these queries will benefit from it?

    -- Query A
    WHERE user_id = 1 AND status = 'shipped'
    -- Query B
    WHERE user_id = 1
    -- Query C
    WHERE status = 'shipped'
    Show Answer

    Query A and Query B will benefit from the index. Query C will not — it skips the leftmost column (user_id) and a B-tree index on (user_id, status) can't efficiently look up by status alone. For Query C to use an index, you'd need a separate index on status.

  3. Why does WHERE LOWER(email) = 'aisha@example.com' fail to use an index on email?

    Show Answer

    The index on email stores the original values (e.g., 'Aisha@example.com'). The query is filtering on LOWER(email) — a transformed value. The index doesn't have LOWER(email) stored, so it can't be used. The fix is to create an expression index: CREATE INDEX ON users(LOWER(email)).

Hands-On Challenge

Task: The ShopFlow team reports that this query is running slowly on a large dataset:

SELECT u.name, COUNT(o.id) AS order_count, SUM(oi.price_at_purchase * oi.quantity) AS revenue
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped'
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY u.id, u.name
ORDER BY revenue DESC NULLS LAST;
  1. Identify which columns this query filters/joins on
  2. Write the CREATE INDEX statements that would speed it up
  3. Write the EXPLAIN ANALYZE command to verify
Show Solution

Step 1: Identify the access patterns

  • orders.user_id — join condition (FK lookup)
  • orders.status — filter condition (status = 'shipped')
  • order_items.order_id — join condition (FK lookup)

Step 2: Create indexes

-- orders: join on user_id, filter on status
-- Composite covers both conditions
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- order_items: join on order_id
CREATE INDEX idx_order_items_order_id ON order_items(order_id);

-- Partial alternative for orders: only index shipped orders
-- (better if "shipped" is a small fraction of all orders)
CREATE INDEX idx_orders_shipped_user ON orders(user_id)
WHERE status = 'shipped';

Step 3: Verify

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count, SUM(oi.price_at_purchase * oi.quantity) AS revenue
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped'
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY u.id, u.name
ORDER BY revenue DESC NULLS LAST;

Look for Index Scan or Bitmap Index Scan nodes in the output where you previously saw Seq Scan. The actual time should be significantly lower on a large dataset.


Summary: Key Takeaways

  • An index is a separate data structure that maps column values to row locations, enabling fast lookups instead of full table scans
  • PostgreSQL's default index type is a B-tree, which supports equality, range queries, and sorting
  • The query planner decides whether to use an index — it's not guaranteed even if an index exists
  • EXPLAIN ANALYZE shows the actual execution plan — always use it to verify index usage
  • CREATE INDEX idx_name ON table(column) creates a single-column index
  • Composite indexes are ordered — (user_id, status) helps queries on user_id alone or user_id + status, but not status alone
  • Partial indexes only index rows matching a condition — smaller, faster, and targeted
  • Expression indexes index the result of a function — needed when you filter with LOWER(), DATE(), etc.
  • Every index slows writes — add indexes based on observed slow queries, not speculation
  • Use pg_stat_user_indexes to find unused indexes and drop them

What's Next?

You've built a complete, well-indexed PostgreSQL schema and written efficient queries against it. Now it's time to use this from application code.

The next step is Connecting Node.js to PostgreSQL: node-postgres (pg) — the raw driver that all higher-level abstractions (Knex, Drizzle, Prisma) are built on. Understanding the driver first means you'll know exactly what ORMs are doing under the hood — and when to bypass them.