Skip to main content

Schema Design: Tables, Types, and Constraints

So far, we've been working with a schema that was handed to you. You ran the seed SQL in Article 1 and started querying. But understanding how to design a schema — choosing the right types, adding the right constraints, thinking through what the database should enforce — is one of the most important skills in backend development.

A good schema prevents bad data from entering the database in the first place. A poorly designed schema relies on application code to catch every mistake — and application code has bugs.

Let's build the ShopFlow schema from scratch, and understand every decision along the way.


Quick Reference

Create a table:

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
stock_quantity INTEGER NOT NULL DEFAULT 0,
category TEXT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Common types:

Use caseType
Auto-incrementing IDSERIAL / BIGSERIAL
Whole numbersINTEGER / BIGINT
Decimal money valuesNUMERIC(10, 2)
Short textTEXT or VARCHAR(n)
Long textTEXT
True/falseBOOLEAN
Date and timeTIMESTAMP / TIMESTAMPTZ
Date onlyDATE

Gotchas:

  • ⚠️ Use NUMERIC (not FLOAT) for money — floating-point arithmetic introduces rounding errors
  • ⚠️ Use TIMESTAMPTZ (with timezone) for production apps — bare TIMESTAMP stores no timezone info
  • ⚠️ VARCHAR(255) has no meaningful performance advantage over TEXT in PostgreSQL

Version Information

Tested with:

  • PostgreSQL: 15.x, 16.x
  • Node.js: v18.x, v20.x, v22.x

What You Need to Know First

Required reading:

What We'll Cover in This Article

By the end of this guide, you'll understand:

  • How to create tables with CREATE TABLE
  • Which PostgreSQL data type to use for each kind of data
  • How NOT NULL, UNIQUE, DEFAULT, CHECK, and PRIMARY KEY constraints work
  • Why NUMERIC beats FLOAT for money
  • Why TIMESTAMPTZ beats TIMESTAMP for production
  • How to drop and modify tables safely
  • How to build the full ShopFlow schema from scratch

What We'll Explain Along the Way

We'll introduce these concepts with full explanations:

  • Why VARCHAR(255) is a historical artifact in PostgreSQL
  • What SERIAL actually is under the hood (a sequence)
  • The difference between a table constraint and a column constraint
  • What happens when a constraint is violated

Part 1: CREATE TABLE — The Basic Structure

A CREATE TABLE statement defines the table name, its columns, and the rules each column must follow.

CREATE TABLE table_name (
column_name data_type [constraints],
column_name data_type [constraints],
...
);

Let's start simple and build up. Here's the most minimal valid products table:

-- Minimal: no constraints, no defaults
CREATE TABLE products (
id INTEGER,
name TEXT,
price NUMERIC
);

This creates the table, but it's almost useless — nothing prevents inserting a product with no name, a negative price, or a duplicate ID. Constraints are what give the schema its teeth.

Dropping Tables

If you need to start over during development:

-- Drop a table (irreversible — all data is deleted)
DROP TABLE products;

-- Drop only if it exists (no error if it doesn't)
DROP TABLE IF EXISTS products;

-- Drop a table along with anything that references it
DROP TABLE products CASCADE;

In production, you never DROP TABLE — you use migrations to modify schema incrementally. We'll cover migrations in the Drizzle and Prisma articles.


Part 2: Choosing the Right Data Type

Every column has a data type that defines what values it can hold and how the database stores them. Choosing the right type matters for correctness, storage efficiency, and query performance.

Integer Types

TypeRangeStorageWhen to Use
SMALLINT-32,768 to 32,7672 bytesRarely — only for very constrained values
INTEGER-2.1B to 2.1B4 bytesDefault for counts, quantities, foreign keys
BIGINT-9.2 quintillion to 9.2 quintillion8 bytesIDs for very large tables, financial amounts in cents

For ShopFlow, INTEGER is fine for all IDs and quantities. If you expected millions of users or billions of orders, BIGINT would be the safer choice.

Auto-Incrementing IDs: SERIAL

SERIAL is PostgreSQL shorthand for "create an integer column with an auto-incrementing sequence attached":

-- These two are equivalent
id SERIAL PRIMARY KEY

-- Under the hood, SERIAL does this:
id INTEGER NOT NULL DEFAULT nextval('products_id_seq') PRIMARY KEY
-- (PostgreSQL creates the sequence automatically)

BIGSERIAL is the same for BIGINT. For most tables, SERIAL is sufficient. If you're building a system that might eventually have billions of rows, use BIGSERIAL from the start — retrofitting it later requires a migration.

Text Types

PostgreSQL has two main text types: TEXT and VARCHAR(n).

name         TEXT          -- unlimited length
short_code VARCHAR(10) -- maximum 10 characters
description TEXT -- unlimited, fine for long content

The VARCHAR(255) myth: In MySQL, VARCHAR(255) has performance implications. In PostgreSQL, it doesn't — TEXT and VARCHAR are stored identically. The (n) in VARCHAR(n) adds a length check constraint and nothing else. Using TEXT for most columns is the PostgreSQL idiom. Reserve VARCHAR(n) only when you genuinely want the database to enforce a character limit (like a two-character country code).

Numeric Types for Decimals

This is where many developers make a critical mistake.

-- ❌ WRONG for money: float arithmetic is imprecise
price FLOAT -- or DOUBLE PRECISION, REAL

-- ✅ CORRECT for money: exact decimal arithmetic
price NUMERIC(10, 2) -- 10 total digits, 2 after decimal point

Why does FLOAT fail for money?

-- Float arithmetic surprises:
SELECT 0.1 + 0.2; -- Returns 0.30000000000000004 in many contexts
SELECT 10.00::FLOAT - 9.99::FLOAT; -- Might return 0.009999999999999... not 0.01

Floats use binary representation that can't precisely represent most decimal fractions. For prices, use NUMERIC(precision, scale):

  • NUMERIC(10, 2) — up to 10 total digits, with exactly 2 after the decimal point
  • NUMERIC(15, 4) — for exchange rates or other high-precision decimals
  • NUMERIC with no arguments — arbitrary precision (rarely needed)

Boolean

is_active    BOOLEAN NOT NULL DEFAULT true
is_featured BOOLEAN NOT NULL DEFAULT false

PostgreSQL accepts TRUE/FALSE, 't'/'f', 1/0, 'yes'/'no' as boolean values. Using TRUE and FALSE in queries is the clearest.

Date and Time Types

This is another area where the wrong choice causes production headaches.

-- ❌ Stores date+time with NO timezone information
created_at TIMESTAMP

-- ✅ Stores date+time WITH timezone (UTC-normalized internally)
created_at TIMESTAMPTZ -- same as TIMESTAMP WITH TIME ZONE

With bare TIMESTAMP, the database stores whatever you give it and gives it back unchanged. If your server is in UTC and your app inserts 2024-03-15 14:00:00, that's what's stored. If you later deploy the app in a different timezone, the stored values become ambiguous.

TIMESTAMPTZ normalizes everything to UTC internally and converts on retrieval. Always use TIMESTAMPTZ for production. The only exception is when you genuinely want to record a "wall clock time" with no timezone (rare — think "alarm that fires at 09:00 regardless of timezone").

-- Other time types
appointment_date DATE -- date only, no time: 2024-03-15
duration INTERVAL -- a span of time: '2 hours', '30 minutes'
meeting_time TIME -- time only, no date: 14:30:00

Part 3: Constraints — Enforcing Business Rules

Constraints are rules the database enforces on every insert and update. They're the difference between a database that keeps itself consistent and one that relies on application code to catch every mistake.

NOT NULL — Required Fields

NOT NULL prevents a column from ever containing NULL:

name  TEXT NOT NULL

Without NOT NULL, you can insert a product with no name — the database will happily accept NULL. With NOT NULL, the insert fails with a clear error.

Rule of thumb: every column should be NOT NULL unless you have a specific reason to allow null. Nullable columns make queries harder (you always have to account for IS NULL), and usually indicate a schema design that can be improved.

-- ❌ Too permissive: name and price can be NULL
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
price NUMERIC(10, 2)
);

-- ✅ Correct: required fields are NOT NULL
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
price NUMERIC(10, 2) NOT NULL
);

DEFAULT — Automatic Values

DEFAULT provides a fallback value when a column is omitted from an INSERT:

stock_quantity  INTEGER NOT NULL DEFAULT 0
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
status TEXT NOT NULL DEFAULT 'pending'

With DEFAULT 0 on stock_quantity, you can insert a product without specifying stock:

-- stock_quantity will be 0 automatically
INSERT INTO products (name, price, category)
VALUES ('New Item', 29.99, 'Accessories');

NOW() is a PostgreSQL function that returns the current timestamp — perfect for created_at columns.

UNIQUE — No Duplicates

UNIQUE ensures no two rows have the same value in that column:

email  TEXT NOT NULL UNIQUE

Trying to insert two users with the same email fails:

-- ❌ Error: duplicate key value violates unique constraint
INSERT INTO users (name, email) VALUES ('Another Aisha', 'aisha@example.com');
-- ERROR: duplicate key value violates unique constraint "users_email_key"

Note: NULL is special with UNIQUE — multiple rows can have NULL in a unique column (because NULL != NULL in SQL's logic). This is usually the desired behavior for optional unique fields.

Multi-column UNIQUE — require a unique combination of values:

-- An order can only have one row per product
-- (quantity handles multiple units, not multiple rows)
UNIQUE (order_id, product_id)

This is written as a table constraint (after the column definitions) rather than a column constraint:

CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
UNIQUE (order_id, product_id) -- table-level constraint
);

CHECK — Custom Validation Rules

CHECK lets you write any boolean expression as a constraint:

-- Price must be non-negative
price NUMERIC(10, 2) NOT NULL CHECK (price >= 0)

-- Quantity must be at least 1
quantity INTEGER NOT NULL CHECK (quantity > 0)

-- Status must be one of the valid values
status TEXT NOT NULL CHECK (status IN ('pending', 'shipped', 'cancelled', 'refunded'))

CHECK violations produce clear errors:

-- ❌ Error: violates check constraint
INSERT INTO products (name, price, category, stock_quantity)
VALUES ('Bad Product', -10.00, 'Electronics', 5);
-- ERROR: new row for relation "products" violates check constraint "products_price_check"

Multi-column CHECK constraints go at the table level:

-- Discount price must be less than regular price
CREATE TABLE promotions (
id SERIAL PRIMARY KEY,
product_id INTEGER NOT NULL,
original_price NUMERIC(10, 2) NOT NULL,
sale_price NUMERIC(10, 2) NOT NULL,
CHECK (sale_price < original_price)
);

PRIMARY KEY — The Row's Identity

PRIMARY KEY combines NOT NULL and UNIQUE and designates the column as the table's unique row identifier. Every table should have one.

id  SERIAL PRIMARY KEY

For most tables, a single auto-incrementing integer is the right primary key. Composite primary keys (combining two columns) are sometimes used for junction tables:

-- Composite primary key on the junction table
-- (instead of a surrogate id column)
CREATE TABLE order_items (
order_id INTEGER NOT NULL REFERENCES orders(id),
product_id INTEGER NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
price_at_purchase NUMERIC(10, 2) NOT NULL,
PRIMARY KEY (order_id, product_id)
);

Part 4: Building the Complete ShopFlow Schema

Let's put everything together. Here's the full ShopFlow schema, built from scratch with proper types and constraints.

Drop any existing tables first if you ran the seed from Article 1:

-- Drop in reverse dependency order (children before parents)
DROP TABLE IF EXISTS order_items;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS users;

Now build the schema:

-- ─────────────────────────────────────────────────
-- USERS
-- ─────────────────────────────────────────────────
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Why these choices:
-- SERIAL: auto-incrementing id, no manual management needed
-- TEXT NOT NULL: name is required
-- TEXT NOT NULL UNIQUE: email is required and must be unique across all users
-- TIMESTAMPTZ: stores timezone-aware timestamp, defaults to current time

-- ─────────────────────────────────────────────────
-- PRODUCTS
-- ─────────────────────────────────────────────────
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT, -- optional
price NUMERIC(10, 2) NOT NULL CHECK (price >= 0),
stock_quantity INTEGER NOT NULL DEFAULT 0 CHECK (stock_quantity >= 0),
category TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Why these choices:
-- description TEXT (no NOT NULL): products can have no description
-- NUMERIC(10,2): exact decimal for money — no floating-point errors
-- CHECK (price >= 0): database enforces non-negative prices
-- DEFAULT 0: new products start with no stock

-- ─────────────────────────────────────────────────
-- ORDERS
-- ─────────────────────────────────────────────────
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
status TEXT NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'shipped', 'cancelled', 'refunded')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Why these choices:
-- REFERENCES users(id): foreign key — orders must belong to a real user
-- CHECK on status: only valid status values are allowed
-- DEFAULT 'pending': new orders start in pending state

-- ─────────────────────────────────────────────────
-- ORDER ITEMS
-- ─────────────────────────────────────────────────
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL REFERENCES orders(id),
product_id INTEGER NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
price_at_purchase NUMERIC(10, 2) NOT NULL CHECK (price_at_purchase >= 0),
UNIQUE (order_id, product_id)
);

-- Why these choices:
-- REFERENCES orders(id): must belong to a real order
-- REFERENCES products(id): must reference a real product
-- CHECK (quantity > 0): can't order zero or negative units
-- price_at_purchase: snapshot of price at time of order (products.price can change)
-- UNIQUE (order_id, product_id): each product appears at most once per order

What the Constraints Enforce

Let's trace what the database now automatically rejects — without any application code:

Attempted actionConstraint that blocks it
Insert product with no nameNOT NULL on products.name
Insert product with negative priceCHECK (price >= 0)
Insert two users with same emailUNIQUE on users.email
Insert order for non-existent userREFERENCES users(id)
Set order status to 'delivered'CHECK (status IN (...))
Insert order item with quantity 0CHECK (quantity > 0)
Add same product twice to an orderUNIQUE (order_id, product_id)

Every one of these would be a data integrity bug if it reached the database unchecked. The schema catches all of them.


Part 5: Modifying Tables with ALTER TABLE

After a table is created, you can modify it without dropping and recreating it:

-- Add a new column
ALTER TABLE products
ADD COLUMN sku TEXT UNIQUE;

-- Make an existing column NOT NULL
-- (requires no existing NULLs in that column, or a DEFAULT)
ALTER TABLE products
ALTER COLUMN description SET NOT NULL;

-- Add a default to an existing column
ALTER TABLE orders
ALTER COLUMN status SET DEFAULT 'pending';

-- Add a constraint to an existing table
ALTER TABLE products
ADD CONSTRAINT check_positive_price CHECK (price >= 0);

-- Remove a constraint
ALTER TABLE products
DROP CONSTRAINT check_positive_price;

-- Rename a column
ALTER TABLE users
RENAME COLUMN name TO full_name;

-- Change a column's type (risky — existing data must be compatible)
ALTER TABLE products
ALTER COLUMN stock_quantity TYPE BIGINT;

In production applications, schema changes are managed through migrations — versioned SQL files that record every change in order. This ensures all environments (development, staging, production) have identical schemas. We'll cover migration workflows in a future Drizzle ORM article.


Common Misconceptions

❌ Misconception: VARCHAR(255) is better than TEXT for performance

Reality: In PostgreSQL, TEXT and VARCHAR use identical internal storage. The only difference is that VARCHAR(n) adds a length-check constraint. There's no performance advantage to VARCHAR(255) — it's a habit carried over from MySQL where the distinction matters.

Fix: Use TEXT by default. Use VARCHAR(n) only when you genuinely want to enforce a maximum length.

❌ Misconception: FLOAT is fine for money because the rounding errors are tiny

Reality: Financial calculations accumulate rounding errors. Summing thousands of transactions with float arithmetic can produce results that are off by cents, which is unacceptable for accounting.

-- Illustration of float imprecision
SELECT 0.1::FLOAT + 0.2::FLOAT;
-- Result: 0.30000000000000004 (not 0.3)

-- NUMERIC is exact
SELECT 0.1::NUMERIC + 0.2::NUMERIC;
-- Result: 0.3

Fix: Always use NUMERIC(precision, scale) for money. Many financial systems also store amounts as integer cents (INTEGER) to avoid decimal math entirely.

❌ Misconception: More constraints mean slower writes

Reality: The performance cost of constraint checks on individual row inserts is negligible. Constraints save far more in debugging time and data cleanup than they cost in write performance. The one constraint type with a meaningful performance cost is foreign keys (they require an index lookup on the referenced table) — and even that cost is usually worth the integrity guarantee.


Troubleshooting Common Issues

Problem: ERROR: column cannot be cast to type integer

Symptoms: ALTER TABLE ... ALTER COLUMN TYPE fails with a cast error.

Cause: Existing data in the column can't be automatically converted to the new type. For example, changing TEXT to INTEGER when some rows contain non-numeric text.

Solution:

-- Check what values exist before changing type
SELECT DISTINCT column_name FROM table_name;

-- If needed, provide an explicit USING clause for the conversion
ALTER TABLE products
ALTER COLUMN stock_quantity TYPE BIGINT
USING stock_quantity::BIGINT;

Problem: ERROR: null value in column violates not-null constraint

Symptoms: An INSERT fails because a required column was omitted.

Cause: A NOT NULL column has no DEFAULT and no value was provided in the INSERT.

Solution:

-- Either provide a value in every INSERT
INSERT INTO products (name, price, category, stock_quantity)
VALUES ('Item', 9.99, 'Accessories', 0); -- stock_quantity explicitly included

-- Or add a DEFAULT to the column
ALTER TABLE products ALTER COLUMN stock_quantity SET DEFAULT 0;

Problem: CHECK constraint name is unclear in error messages

Symptoms: ERROR: violates check constraint "products_price_check" — you're not sure which constraint this refers to.

Solution: Name your constraints explicitly for clearer error messages:

-- ❌ Auto-generated name: "products_price_check"
price NUMERIC(10, 2) CHECK (price >= 0)

-- ✅ Named constraint: "price_must_be_non_negative"
price NUMERIC(10, 2) CONSTRAINT price_must_be_non_negative CHECK (price >= 0)

Check Your Understanding

Quick Quiz

  1. Why should you use NUMERIC(10, 2) instead of FLOAT for price columns?

    Show Answer

    FLOAT uses binary floating-point representation, which cannot precisely represent most decimal fractions. Financial calculations accumulate these rounding errors. NUMERIC uses exact decimal arithmetic — 0.1 + 0.2 is exactly 0.3, not 0.30000000000000004.

  2. What's the difference between a column constraint and a table constraint?

    Show Answer

    A column constraint is written inline with the column definition and applies to that column only (price NUMERIC NOT NULL). A table constraint is written after all column definitions and can reference multiple columns (UNIQUE (order_id, product_id), CHECK (sale_price < original_price)). Some constraints (like PRIMARY KEY and UNIQUE) can be written either way — use the table form when multiple columns are involved.

  3. Why does the order_items table store price_at_purchase instead of just referencing products.price?

    Show Answer

    Product prices change over time. If order_items referenced products.price directly, old orders would show the current price instead of the price the customer actually paid. By storing price_at_purchase as a snapshot at the time of the order, historical order totals remain correct even when prices change later.

Hands-On Challenge

Task: Design a reviews table for ShopFlow. A review belongs to a product and a user, has a rating (1–5), optional text content, and a timestamp. Add appropriate constraints: a user can only review each product once; rating must be between 1 and 5; the review must belong to a real user and real product.

Show Solution
CREATE TABLE reviews (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
product_id INTEGER NOT NULL REFERENCES products(id),
rating SMALLINT NOT NULL CHECK (rating BETWEEN 1 AND 5),
content TEXT, -- optional review text
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (user_id, product_id) -- one review per user per product
);

Design decisions explained:

  • SMALLINT for rating — values 1–5 fit in 2 bytes; CHECK (rating BETWEEN 1 AND 5) enforces the valid range
  • content TEXT (no NOT NULL) — review text is optional; a user might just want to leave a star rating
  • UNIQUE (user_id, product_id) — table-level constraint preventing duplicate reviews
  • REFERENCES users(id) and REFERENCES products(id) — orphaned reviews (pointing to deleted users or products) are prevented

Summary: Key Takeaways

  • CREATE TABLE defines columns with data types and constraints — the schema is the database's rulebook
  • Use SERIAL for auto-incrementing IDs; BIGSERIAL for very large tables
  • Use NUMERIC(10, 2) for money — never FLOAT or DOUBLE PRECISION
  • Use TIMESTAMPTZ for timestamps in production — it preserves timezone information
  • TEXT and VARCHAR(n) perform identically in PostgreSQL — use TEXT by default
  • Constraints enforce business rules at the database level: NOT NULL, UNIQUE, DEFAULT, CHECK, PRIMARY KEY
  • Column constraints apply to one column; table constraints can span multiple columns
  • ALTER TABLE modifies existing tables without dropping them — in production, this is done through migrations
  • A well-constrained schema rejects bad data automatically, before it ever reaches application code

What's Next?

You've designed the ShopFlow schema with proper types and constraints. But the schema still refers to REFERENCES users(id) without explaining how foreign keys work in detail — what happens on delete, how cascades work, and how to design the right relationship for each scenario.

The next step is Foreign Keys and Relationships: Linking Tables — where you'll learn the full mechanics of foreign key constraints, the difference between ON DELETE CASCADE, RESTRICT, and SET NULL, and how to design one-to-many and many-to-many relationships correctly.