Indexes: Making Queries Fast
You've written queries. You've designed a schema. For a database with a few hundred rows, everything feels instant. Then the ShopFlow catalog grows to 100,000 products, orders reach a million rows, and that same WHERE category = 'Electronics' query suddenly takes three seconds.
The fix is usually an index. But indexes aren't magic — they trade write overhead for read speed, and adding the wrong ones can slow your database down instead of speeding it up. Let's understand how they actually work.
Quick Reference
Create an index:
-- Single column
CREATE INDEX idx_products_category ON products(category);
-- Composite (column order matters)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Partial index (only indexes rows matching a condition)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
Check if your index is being used:
EXPLAIN ANALYZE
SELECT * FROM products WHERE category = 'Electronics';
Gotchas:
- ⚠️ Every index slows down
INSERT,UPDATE, andDELETE— don't index everything - ⚠️ In composite indexes, column order matters:
(user_id, status)helpsWHERE user_id = 1but(status, user_id)does not - ⚠️ PostgreSQL may ignore your index if it estimates a sequential scan is faster (small tables, low selectivity)
- ⚠️ Functions on indexed columns defeat the index:
WHERE LOWER(email) = 'x'won't use an index onemail
Version Information
Tested with:
- PostgreSQL: 15.x, 16.x
- Node.js: v18.x, v20.x, v22.x
Last verified: May 2025
What You Need to Know First
Required reading:
- SQL Basics: Reading Data with SELECT — the queries we'll be optimizing
- Schema Design: Tables, Types, and Constraints — understanding PRIMARY KEY and UNIQUE indexes
- Foreign Keys and Relationships: Linking Tables — FK columns that need manual indexes
What We'll Cover in This Article
By the end of this guide, you'll understand:
- What an index is and how B-tree indexes work internally (simplified)
- How PostgreSQL decides between an index scan and a sequential scan
- How to create single-column, composite, unique, and partial indexes
- How to read
EXPLAIN ANALYZEoutput - When indexes help and when they hurt
- How to index the ShopFlow queries you've already written
What We'll Explain Along the Way
We'll introduce these concepts with full explanations:
- The query planner — PostgreSQL's internal optimizer
- Index selectivity — why some indexes are ignored
- Write amplification — the cost every index adds to mutations
- Dead tuples and
VACUUM(briefly)
Part 1: What an Index Actually Is
A database index is the same idea as a book index. In a 500-page book, if you want every page that mentions "foreign key," you don't read every page — you check the index in the back, which maps the term to specific page numbers and takes you directly there.
A database index works the same way. Without an index on products.category, a query like WHERE category = 'Electronics' requires a sequential scan — reading every single row in the table to find the ones that match. With 100,000 products, that's 100,000 row reads.
With an index on category, the database maintains a separate, sorted data structure that maps category values to the rows that have them. The query becomes a fast lookup into that structure, followed by direct retrieval of only the matching rows.
B-Tree: The Default Index Type
PostgreSQL's default index type is a B-tree (balanced tree). You don't need to understand the implementation deeply, but the structure gives you an intuition for what queries it helps:
B-Tree for products.category:
[Furniture]
/ \
[Electronics] [Furniture]
/ \
[Accessories] [Electronics]
A B-tree keeps values sorted. This means it efficiently supports:
- Exact matches:
WHERE category = 'Electronics' - Range queries:
WHERE price BETWEEN 40 AND 80 - Starts-with searches:
WHERE name LIKE 'Wireless%' - Sorting:
ORDER BY price(data is already sorted in the index)
It does not help with:
- Contains searches:
WHERE name LIKE '%Keyboard%'(can't start in the middle of a sorted tree) - Function results:
WHERE LOWER(email) = 'aisha@example.com'(the index storesemail, notLOWER(email))
Part 2: The Query Planner — PostgreSQL Decides
You don't instruct PostgreSQL to use an index. You create the index, and the query planner (also called the optimizer) decides whether using it is faster than a sequential scan.
This decision is based on statistics that PostgreSQL maintains about each table:
- How many rows are in the table
- How many distinct values exist in each column
- How evenly values are distributed
From these statistics, the planner estimates the selectivity of a query — what fraction of rows the condition will match. High selectivity (few matching rows) → index scan likely wins. Low selectivity (many matching rows) → sequential scan might win.
Here's a concrete example. If your products table has 100,000 rows and only 8 rows have category = 'Electronics', the index dramatically narrows the search. But if 90,000 of 100,000 rows have category = 'Electronics', the index would send the database to almost every row anyway — a sequential scan of the table is cheaper because it avoids the overhead of following index pointers.
This is why adding an index on a low-cardinality column (one with few distinct values, like a boolean) rarely helps — and sometimes slows things down.
Part 3: EXPLAIN ANALYZE — Seeing the Query Plan
EXPLAIN ANALYZE shows you exactly how PostgreSQL is executing a query. It's the most important diagnostic tool for query performance.
First, let's create an index to observe the difference:
-- Start without an index on category
EXPLAIN ANALYZE
SELECT name, price FROM products WHERE category = 'Electronics';
With a small table (our 8-row seed data), you'll see something like:
Seq Scan on products (cost=0.00..1.08 rows=5 width=22)
(actual time=0.012..0.016 rows=5 loops=1)
Filter: (category = 'Electronics')
Rows Removed by Filter: 3
Planning Time: 0.118 ms
Execution Time: 0.031 ms
Now create an index and run again:
CREATE INDEX idx_products_category ON products(category);
EXPLAIN ANALYZE
SELECT name, price FROM products WHERE category = 'Electronics';
With 8 rows, PostgreSQL will likely still use a sequential scan — the table is tiny and there's no point using the index. This is the planner working correctly.
To see the index used, you need to either add significantly more data, or temporarily disable sequential scans for testing:
-- Force index usage for testing (don't do this in production)
SET enable_seqscan = OFF;
EXPLAIN ANALYZE
SELECT name, price FROM products WHERE category = 'Electronics';
Index Scan using idx_products_category on products
(cost=0.14..8.16 rows=5 width=22)
(actual time=0.022..0.029 rows=5 loops=1)
Index Cond: ((category)::text = 'Electronics'::text)
Planning Time: 0.213 ms
Execution Time: 0.047 ms
Reading EXPLAIN ANALYZE Output
The key parts to understand:
Seq Scan on products ← Scan type (Seq Scan vs Index Scan vs Bitmap Index Scan)
(cost=0.00..1.08 ← Estimated cost: startup..total
rows=5 ← Estimated rows returned
width=22) ← Estimated average row width in bytes
(actual time=0.012..0.016 ← Actual time: startup..total (milliseconds)
rows=5 ← Actual rows returned
loops=1) ← How many times this node ran
Filter: (category = ...) ← The condition applied
Rows Removed by Filter: 3 ← Rows read but discarded
What to look for:
Seq Scanon a large table with a highly selectiveWHERE→ missing index- Large difference between estimated rows and actual rows → stale statistics, run
ANALYZE - High
Rows Removed by Filter→ the scan is reading many rows it doesn't need Nested Loopjoining large tables → might need a better index on the join column
Part 4: Creating the Right Indexes
Single-Column Indexes
The most common type. Index the column that appears in WHERE, JOIN ON, or ORDER BY:
-- Speed up filtering by category
CREATE INDEX idx_products_category ON products(category);
-- Speed up FK lookups (covered in Article 6 — adding them here for completeness)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
-- Speed up ordering and filtering by date
CREATE INDEX idx_orders_created_at ON orders(created_at);
Composite Indexes — Column Order Matters
A composite index covers multiple columns. The critical rule: the index can only be used by queries that filter on the leftmost columns first.
-- Index: (user_id, status)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
This index helps:
-- ✅ Uses index: filters on user_id (leftmost column)
SELECT * FROM orders WHERE user_id = 1;
-- ✅ Uses index: filters on both columns
SELECT * FROM orders WHERE user_id = 1 AND status = 'shipped';
This index does NOT help:
-- ❌ Cannot use index: skips the leftmost column
SELECT * FROM orders WHERE status = 'shipped';
Think of a composite index like a phone book sorted by (last name, first name). You can look up "Smith" quickly, or "Smith, John" specifically. But you can't efficiently find all "Johns" without scanning every entry.
Column order strategy: Put the most selective column (fewest distinct values relative to total rows — wait, actually the opposite: most values, highest cardinality) first. Put columns used in equality conditions (=) before columns used in range conditions (>, <, BETWEEN).
Unique Indexes
CREATE UNIQUE INDEX enforces uniqueness and provides fast lookups:
-- Already created implicitly by UNIQUE constraint, but this is the explicit form
CREATE UNIQUE INDEX idx_users_email ON users(email);
Note: UNIQUE constraints in CREATE TABLE automatically create a unique index behind the scenes. You don't need to create both.
Partial Indexes — Index Only What You Query
A partial index only indexes rows matching a condition. The index is smaller, faster to maintain, and sometimes faster for the targeted query:
-- Only index pending orders (the ones actively needing processing)
-- Completed orders don't need to be in this index
CREATE INDEX idx_orders_pending_created
ON orders(created_at)
WHERE status = 'pending';
This index is used only when the query includes WHERE status = 'pending':
-- ✅ Uses the partial index
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;
-- ❌ Does not use this partial index (wrong status)
SELECT * FROM orders WHERE status = 'shipped' ORDER BY created_at;
Partial indexes shine when a small subset of rows is queried frequently — like active records, pending items, or flagged rows.
Expression Indexes — Index the Transformed Value
If you frequently query using a function on a column, index the function's output:
-- Query that a regular index on email WON'T speed up
SELECT * FROM users WHERE LOWER(email) = 'aisha@example.com';
-- Create an index on the expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now this query uses the index
SELECT * FROM users WHERE LOWER(email) = 'aisha@example.com';
Part 5: The ShopFlow Index Strategy
Let's audit the queries we've written across this module and create the right indexes for each.
Indexes for ShopFlow
-- ─────────────────────────────────────────────────────
-- USERS
-- ─────────────────────────────────────────────────────
-- Email lookups (login, duplicate checks)
-- Already created by UNIQUE constraint on users.email
-- No additional index needed.
-- Case-insensitive email search (if your app needs it)
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- ─────────────────────────────────────────────────────
-- PRODUCTS
-- ─────────────────────────────────────────────────────
-- Filter by category (common storefront query)
CREATE INDEX idx_products_category ON products(category);
-- Filter by category + price range (e.g., "Electronics under $100")
CREATE INDEX idx_products_category_price ON products(category, price);
-- Find in-stock products (high-frequency query)
CREATE INDEX idx_products_in_stock ON products(category)
WHERE stock_quantity > 0;
-- Text search on product name (basic prefix search)
CREATE INDEX idx_products_name ON products(name text_pattern_ops);
-- text_pattern_ops enables LIKE 'prefix%' queries on this index
-- ─────────────────────────────────────────────────────
-- ORDERS
-- ─────────────────────────────────────────────────────
-- FK lookup: all orders for a user
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Filter by status (dashboard, processing queue)
CREATE INDEX idx_orders_status ON orders(status);
-- Combined: orders for a user in a specific status
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Recent orders (date-filtered queries)
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
-- Active order processing (partial index — only pending orders)
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- ─────────────────────────────────────────────────────
-- ORDER ITEMS
-- ─────────────────────────────────────────────────────
-- FK lookup: all items for an order
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
-- FK lookup: all orders containing a product
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
Part 6: When Indexes Hurt
Every index has a cost. Every time a row is inserted, updated, or deleted, PostgreSQL must update every index that covers the affected columns. In a write-heavy table, too many indexes can slow writes more than they help reads.
Signs you have too many indexes:
INSERTperformance is noticeably slow on a tablepg_stat_user_indexesshows indexes with zero or near-zero scans- The table has more indexes than it has columns
Finding unused indexes:
-- Check which indexes are actually being used
SELECT
schemaname,
tablename,
indexname,
idx_scan AS times_used,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;
An index with idx_scan = 0 after weeks of production traffic is probably safe to drop.
Dropping an index:
DROP INDEX idx_products_category;
-- Drop without error if it doesn't exist
DROP INDEX IF EXISTS idx_products_category;
The Right Mental Model
Add indexes in response to observed slow queries, not preemptively for every column. The workflow:
- Measure: identify a slow query using
EXPLAIN ANALYZEor application monitoring - Understand: find the scan type and what condition is being filtered
- Index: create a targeted index for that specific query pattern
- Verify: run
EXPLAIN ANALYZEagain to confirm the index is used and the query is faster - Monitor: watch
pg_stat_user_indexesover time to confirm the index is earning its keep
Common Misconceptions
❌ Misconception: More indexes = faster database
Reality: Each index adds overhead to every write. A table with 15 indexes is slower to insert into than one with 3 indexes. And the query planner has to evaluate more options, which slightly increases planning time.
Fix: Index based on actual slow queries. Prefer fewer, well-chosen indexes over many speculative ones.
❌ Misconception: My index exists, so it's being used
Reality: PostgreSQL may choose a sequential scan even when a relevant index exists — if the table is small, the condition has low selectivity, or the planner's statistics suggest it's faster.
Verification:
EXPLAIN ANALYZE SELECT ... -- Always verify with EXPLAIN ANALYZE
❌ Misconception: An index on (a, b) is the same as separate indexes on a and b
Reality: A composite index (user_id, status) is different from two separate indexes on user_id and status.
(user_id, status)is optimally fast for queries that filter by both columns and for queries that filter byuser_idalone- Two separate indexes let the planner use one, the other, or both (via bitmap index scan) — but the composite index is faster when both columns are used together
Use composite indexes when you frequently query by multiple columns together. Keep separate indexes when columns are more often queried independently.
Troubleshooting Common Issues
Problem: Query is slow despite having an index
Symptoms: EXPLAIN ANALYZE shows Seq Scan even though you created an index.
Common causes:
- Table is too small — PostgreSQL chose sequential scan because it's cheaper
- Low selectivity — too many rows match the condition
- Stale statistics — run
ANALYZE table_nameto update - Function on the indexed column defeating the index
Diagnostic:
-- Update statistics first
ANALYZE products;
-- Then check the plan
EXPLAIN ANALYZE SELECT * FROM products WHERE category = 'Electronics';
-- If still using SeqScan on a large table, check selectivity:
SELECT category, COUNT(*) FROM products GROUP BY category;
-- If 80% of rows are 'Electronics', an index won't help much
Problem: Index is listed in pg_stat_user_indexes with 0 scans
Symptoms: The index exists but has never been used.
Cause: Either the index doesn't match any actual query patterns, or the table is small enough that PostgreSQL always chooses a sequential scan.
Solution: Check if the index is needed. If it's covering a query you intended to optimize, verify the query's WHERE clause matches the index columns exactly (including no functions on indexed columns).
Check Your Understanding
Quick Quiz
-
Why might PostgreSQL choose a sequential scan even when an index exists?
Show Answer
PostgreSQL uses the query planner to estimate whether an index scan or sequential scan is cheaper. If the table is small, the condition matches most rows (low selectivity), or the statistics are stale, the planner may determine a sequential scan is faster. It also skips the index overhead of pointer-following. The planner's decision is based on cost estimation, not just index availability.
-
You have a composite index on
(user_id, status). Which of these queries will benefit from it?-- Query A
WHERE user_id = 1 AND status = 'shipped'
-- Query B
WHERE user_id = 1
-- Query C
WHERE status = 'shipped'Show Answer
Query A and Query B will benefit from the index. Query C will not — it skips the leftmost column (
user_id) and a B-tree index on(user_id, status)can't efficiently look up bystatusalone. For Query C to use an index, you'd need a separate index onstatus. -
Why does
WHERE LOWER(email) = 'aisha@example.com'fail to use an index onemail?Show Answer
The index on
emailstores the original values (e.g.,'Aisha@example.com'). The query is filtering onLOWER(email)— a transformed value. The index doesn't haveLOWER(email)stored, so it can't be used. The fix is to create an expression index:CREATE INDEX ON users(LOWER(email)).
Hands-On Challenge
Task: The ShopFlow team reports that this query is running slowly on a large dataset:
SELECT u.name, COUNT(o.id) AS order_count, SUM(oi.price_at_purchase * oi.quantity) AS revenue
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped'
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY u.id, u.name
ORDER BY revenue DESC NULLS LAST;
- Identify which columns this query filters/joins on
- Write the
CREATE INDEXstatements that would speed it up - Write the
EXPLAIN ANALYZEcommand to verify
Show Solution
Step 1: Identify the access patterns
orders.user_id— join condition (FK lookup)orders.status— filter condition (status = 'shipped')order_items.order_id— join condition (FK lookup)
Step 2: Create indexes
-- orders: join on user_id, filter on status
-- Composite covers both conditions
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- order_items: join on order_id
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
-- Partial alternative for orders: only index shipped orders
-- (better if "shipped" is a small fraction of all orders)
CREATE INDEX idx_orders_shipped_user ON orders(user_id)
WHERE status = 'shipped';
Step 3: Verify
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count, SUM(oi.price_at_purchase * oi.quantity) AS revenue
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status = 'shipped'
LEFT JOIN order_items oi ON oi.order_id = o.id
GROUP BY u.id, u.name
ORDER BY revenue DESC NULLS LAST;
Look for Index Scan or Bitmap Index Scan nodes in the output where you previously saw Seq Scan. The actual time should be significantly lower on a large dataset.
Summary: Key Takeaways
- An index is a separate data structure that maps column values to row locations, enabling fast lookups instead of full table scans
- PostgreSQL's default index type is a B-tree, which supports equality, range queries, and sorting
- The query planner decides whether to use an index — it's not guaranteed even if an index exists
EXPLAIN ANALYZEshows the actual execution plan — always use it to verify index usageCREATE INDEX idx_name ON table(column)creates a single-column index- Composite indexes are ordered —
(user_id, status)helps queries onuser_idalone oruser_id + status, but notstatusalone - Partial indexes only index rows matching a condition — smaller, faster, and targeted
- Expression indexes index the result of a function — needed when you filter with
LOWER(),DATE(), etc. - Every index slows writes — add indexes based on observed slow queries, not speculation
- Use
pg_stat_user_indexesto find unused indexes and drop them
What's Next?
You've built a complete, well-indexed PostgreSQL schema and written efficient queries against it. Now it's time to use this from application code.
The next step is Connecting Node.js to PostgreSQL: node-postgres (pg) — the raw driver that all higher-level abstractions (Knex, Drizzle, Prisma) are built on. Understanding the driver first means you'll know exactly what ORMs are doing under the hood — and when to bypass them.