Skip to main content

Message Queues: The Problem They Solve

Imagine you walk into a coffee shop, place your order, and the barista says: "Don't move. Stand right here. I'm going to make your coffee now, and you cannot leave until it's done."

That's bizarre, right? In real life, you place your order, get a number, and sit down. The barista makes your coffee whenever they're ready. You don't have to wait at the counter — and the barista doesn't have to deal with a mob of people crowding the machine.

That gap between placing an order and receiving it is the core idea behind a message queue.

Most web applications start without one — and work perfectly well, until one day they don't. A slow task blocks your server. A downstream service crashes and takes your whole app with it. Traffic spikes and your API starts returning errors. These problems have a common root cause, and message queues are one of the most powerful structural fixes for all three.

By the end of this article, you'll understand exactly what a message queue is, why you need one, what happens when you try to build your own, and how to pick between the popular options. No prior knowledge assumed — we'll define every term as we go.


Quick Reference

What a message queue does in one sentence: It lets one part of your system hand off work to another part, without waiting for it to finish.

The simplest mental model:

Your API (producer)  →  [Queue]  →  Worker (consumer)
"Here's a job" stores it "Got it, I'll handle it"

When you need one:

  • A task takes more than ~200ms and the user doesn't need the result immediately
  • A downstream service (email provider, image resizer, payment gateway) might be slow or temporarily unavailable
  • You want to process work in the background without blocking your HTTP response
  • You need to retry failed work automatically

When you don't need one:

  • Simple CRUD apps with fast database reads and writes
  • Tasks that must complete before the user gets a response (e.g. verifying a login password)
  • Single-process scripts that run once and exit

See also:


Version Information

This article covers concepts, not specific library versions. Code examples use plain TypeScript with no dependencies to keep things clear.

When we do write code against a broker, we'll use:

  • Node.js: v22.x LTS (current stable)
  • RabbitMQ: 4.2.x (current stable — released October 2025)
  • amqplib: 0.10.7+ (minimum required for RabbitMQ 4.1.x and later)

Last verified: November 2025


What You Need to Know First

This is the first article in the Message Queues & RabbitMQ module. It's written for someone who has never used a message queue before.

You should be comfortable with:

  • TypeScript basics — functions, async/await, interfaces
  • What an API is — a server that receives HTTP requests and returns responses
  • The idea of a "server" — a program that runs continuously and handles requests

That's it. Everything else — broker, producer, consumer, AMQP, quorum queues — gets defined here or in later articles.

What We'll Cover in This Article

By the end of this guide, you'll understand:

  • Why synchronous processing breaks under real-world load
  • What a message queue is, in plain language
  • What happens when you try to build your own queue (and why it fails)
  • What a "broker" is and why you need one instead of a homemade solution
  • The four main options (BullMQ, RabbitMQ, Kafka, Inngest) — just enough to know which direction to look

What We'll Explain Along the Way

Don't worry if these are unfamiliar — we'll explain each one when it comes up:

  • Synchronous vs asynchronous — the difference between "wait for it" and "I'll handle it later"
  • Producer and consumer — the two roles in any queue system
  • Broker — the middleman that stores and delivers messages
  • Durability — whether jobs survive a server crash
  • Idempotency — why processing the same job twice needs to produce the same result

Part 1: The Problem with Doing Everything Immediately

Let's build a real scenario and watch it break.

You're building a SaaS platform where users can upload a profile photo. When the photo uploads, your API needs to:

  1. Save the original file to cloud storage
  2. Resize the image to three different sizes (thumbnail, medium, large)
  3. Update the user's profile with the new photo URLs
  4. Send a confirmation email

Here's how most developers write this initially:

// The naive approach — everything happens during the HTTP request
app.post("/upload-photo", async (req, res) => {
// Step 1: Save to cloud storage (~100ms)
const originalUrl = await saveToStorage(req.file);

// Step 2: Resize the image — this is the slow one
// Resizing a high-resolution photo can take 1–3 seconds
const resizedUrls = await resizeImage(originalUrl, [
"100px",
"400px",
"800px",
]);

// Step 3: Update the database (~20ms)
await db.users.update(req.userId, { photoUrls: resizedUrls });

// Step 4: Send confirmation email (~300ms, if the email service is fast)
await sendEmail(req.userEmail, "Your photo has been updated!");

// Only NOW does the user get a response
res.json({ success: true, photoUrls: resizedUrls });
});

This works fine with five users. Let's see what happens with five hundred.

The three ways this breaks at scale

Problem 1: The user waits too long.

Every one of those steps happens before the user sees a response. Adding up the times: 100ms + 2000ms + 20ms + 300ms = over 2 seconds for a single photo upload. That's a terrible experience. Users start clicking the button again, triggering duplicate uploads.

Problem 2: One slow service makes everything slow.

Imagine the email service goes down for 30 seconds. Every upload request now hangs for 30 seconds waiting for an email that will never send. Your entire upload feature is broken — not because of a bug in your code, but because a completely unrelated service had a hiccup.

This is called tight coupling — when one part of your system fails, it drags other parts down with it. The image resizer, the email service, and the API are all tied together in one fragile chain.

Problem 3: Your server can only handle so many simultaneous slow requests.

Node.js can handle thousands of quick requests at once. But if each request takes 2+ seconds, the server gets clogged. Slow requests pile up. New requests start timing out. Users get errors.

Here's what that failure looks like as a diagram:

          User A uploads photo

API starts processing
(resizing... resizing...)

Email service is slow...
↓ (2+ seconds later)
User A finally gets a response ✅

Meanwhile:
User B uploads photo → API is busy → waits
User C uploads photo → API is busy → waits
User D uploads photo → timeout error ❌
User E uploads photo → timeout error ❌

Diagram: When all work happens inside the HTTP request, slow tasks back up the server. New users experience timeouts while earlier users finish.

The insight that changes everything

Here's the key question to ask about every task your server does:

Does the user need this to complete before they get a response?

For our photo upload:

  • "Did my file upload successfully?" → Yes, the user needs to know this immediately.
  • "Has the image been resized?" → Not really. They can see the original while resizing happens in the background.
  • "Has the confirmation email been sent?" → Definitely not. Nobody waits for an email before moving on.

The moment you separate "what the user needs immediately" from "what can happen in the background," you've discovered why message queues exist.


Part 2: What a Message Queue Actually Is

Let's go back to the coffee shop. The key roles are:

  • Customer → places an order (doesn't care how it's made)
  • The order slip / ticket system → holds the order until a barista is ready
  • Barista → makes the coffee when they have capacity

In software:

  • Producer → the part of your system that creates work (your API)
  • Queue → where the work is stored temporarily
  • Consumer (also called a worker) → the part of your system that does the work

Here's the photo upload rewritten with this mental model:

User uploads photo

API saves original to storage ✅ (fast — user needs this)

API puts a job in the queue:
"Please resize photo-abc.jpg for user u_123"

API immediately responds to user: "Upload successful!" ✅ (user is happy)

Meanwhile, separately:

Worker picks up the resize job from the queue

Worker resizes the image (takes 2 seconds — nobody is waiting)

Worker updates the database

Worker sends the confirmation email

Worker marks the job as complete ✅

The user gets a fast response. The slow work happens in the background. If the email service is down, only the email job fails — the upload itself succeeded. The API and the image resizer no longer need to know about each other.

This is what loose coupling means. Each part of the system does one job and hands off to the next via the queue.

The vocabulary, defined clearly

Let's pin down the terms you'll see everywhere in this module:

Producer — any code that creates a message and puts it in the queue. In our example, the API route that handles photo uploads is the producer. It produces work for someone else to do.

Message — the unit of data passed through the queue. It's usually a JSON object describing what needs to be done:

{
"jobType": "image-resize",
"imageKey": "uploads/photo-abc.jpg",
"userId": "u_123",
"targetSizes": ["100px", "400px", "800px"]
}

Consumer (or worker) — any code that reads messages from the queue and does the work. Consumers run as separate processes from your API — they're long-running programs that loop forever, waiting for new messages.

Queue — the ordered list of messages waiting to be processed. Think of it as a to-do list that's shared between the producer and consumer.

Broker — the software that runs the queue, stores messages, and delivers them to consumers. This is the key piece we haven't talked about yet.


Part 3: Why You Can't Just Use an Array

"Wait," you might think. "Can't I just use an array as my queue? Push jobs to an array, have a worker pop from it — simple."

Let's try it:

// The homemade queue — looks fine at first
const jobQueue: ResizeJob[] = [];

// Producer: push a job
function enqueueResizeJob(job: ResizeJob): void {
jobQueue.push(job);
console.log(`Job enqueued. Queue depth: ${jobQueue.length}`);
}

// Consumer: poll every 500ms and process the next job
setInterval(async () => {
const job = jobQueue.shift(); // remove and return the first item
if (!job) return;
await resizeImage(job);
}, 500);

This actually works — in your local development environment, with one process, no crashes, and light load. The moment you take it to production, it fails in at least five different ways.

Failure 1: Your server restarts and all jobs vanish

Every time you deploy new code, your Node.js process restarts. The array is in memory — when the process dies, the array dies with it. Any jobs that were waiting get lost silently.

Sunday 3 PM: User uploads 47 photos
Sunday 3 PM: Deployment starts — server restarts
Sunday 3 PM: jobQueue = [] ← all 47 resize jobs are gone
Sunday 3 PM: 47 users never get their photos resized
No error. No log. Just silence.

What you need: Persistence — jobs stored somewhere that survives process restarts (a database or a dedicated broker).

Failure 2: Two workers process the same job

Your app gets popular. You run two Node.js processes for redundancy. Each process has its own copy of jobQueue in memory. Jobs enqueued in Process A's array are invisible to Process B.

You move the queue to a shared database. Now both workers query the database and try to grab the next job — but without proper locking, both workers see the same job and process it simultaneously. One user's photo gets resized twice; two workers fight over the same row.

Worker A: SELECT * FROM jobs WHERE status='pending' LIMIT 1 → gets job #42
Worker B: SELECT * FROM jobs WHERE status='pending' LIMIT 1 → also gets job #42
Both workers call resizeImage(job42) at the same time → chaos

What you need: Atomic delivery — a guarantee that only one consumer receives each message, even with multiple workers running.

Failure 3: A crashed worker loses the job forever

Your worker picks up job #42, starts resizing the image — and then crashes halfway through (out-of-memory error, network failure, anything).

The homemade queue already removed the job from the array when the worker picked it up. The job is gone. The work was never finished. Nobody knows.

setInterval(async () => {
const job = jobQueue.shift(); // ← job removed HERE
// (it's gone from the queue)

await resizeImage(job); // ← worker crashes HERE
// (job was never finished)
// (no recovery possible)
}, 500);

What you need: Acknowledgement — the broker keeps the message until the consumer explicitly says "I finished this successfully." If the consumer crashes, the broker redelivers the message to another worker.

Failure 4: The queue grows faster than workers can drain it

Your product goes on the front page of Hacker News. 10,000 users upload photos in one hour. Your workers can handle 100 per hour. The array grows to 9,900 items. Your Node.js process runs out of memory and crashes — taking the remaining queue with it.

What you need: Back-pressure — a mechanism for the broker to slow down producers when consumers can't keep up, and the ability to store a large message backlog safely on disk.

Failure 5: You can't see what's happening

At 2 AM your phone buzzes. "Photo uploads are broken." You open your laptop. You have no idea how many jobs are in the queue, which jobs are failing, or why. You can't inspect individual jobs. You can't replay a failed job. You can't see how long jobs are taking.

What you need: Observability — a dashboard showing queue depth, consumer throughput, failure rate, and the ability to inspect and replay individual messages.

The gap in one table

Here's what a real message queue gives you that an array never will:

What you needHomemade arrayReal broker
Jobs survive server restarts
Only one worker processes each job
Failed jobs get retried automatically
Slow consumers don't crash the system
Dashboard to see what's happening

Every single one of these gaps is a production incident waiting to happen.


Part 4: What a Broker Is — and Why It's the Missing Piece

A broker is the software that runs between your producers and consumers. It's a dedicated program (or cluster of programs) whose entire job is to:

  • Accept messages from producers
  • Store them safely (even across restarts)
  • Deliver them to consumers one at a time
  • Track which messages have been processed
  • Redeliver messages if a consumer crashes mid-processing
  • Provide visibility into what's happening

Think of the broker as a professional post office. You (the producer) drop off a package. The post office (broker) stores it safely, tracks it, and delivers it to the recipient (consumer). If the delivery driver crashes their van, the post office redelivers the package — it doesn't disappear.

WITHOUT a broker:
Producer → [array in memory] → Consumer
(fragile, no tracking, disappears on restart)

WITH a broker:
Producer → [Broker] → Consumer

Stored safely on disk
Tracks every message
Redelivers on failure
Shows you a dashboard

The broker is what transforms a fragile array into a reliable job queue.


Part 5: Delivery Guarantees — What a Broker Promises You

Before choosing a broker, you need to understand what it promises about message delivery. There are three levels:

At-most-once delivery

The broker delivers the message once and doesn't track whether it was processed. If the consumer crashes before finishing, the message is gone.

When it's okay: Non-critical, high-volume data where losing occasional messages is fine — live metrics, analytics events, real-time dashboard updates.

When it's dangerous: Anything the user cares about — payment processing, file uploads, email sending.

At-least-once delivery

The broker keeps the message until the consumer explicitly says "I'm done." If the consumer crashes before saying that, the broker redelivers the message to another consumer.

The catch: the consumer might receive the same message more than once (if it finishes the work but crashes before saying "done"). This means your consumer needs to be idempotent — processing the same job twice must produce the same result as processing it once.

Idempotent is a word that sounds scary but means something simple: "doing it twice doesn't cause problems." Resizing an image to 400px and saving it — doing that twice just overwrites the first result with an identical one. That's idempotent. Sending a "welcome" email to a user — doing that twice means the user gets two emails. That's not idempotent.

// ❌ Not idempotent — charging twice means the user pays twice
async function processPayment(job: PaymentJob): Promise<void> {
await stripe.charges.create({ amount: job.amount, customer: job.customerId });
}

// ✅ Idempotent — using an idempotency key, Stripe ignores duplicates
async function processPayment(job: PaymentJob): Promise<void> {
await stripe.charges.create({
amount: job.amount,
customer: job.customerId,
idempotency_key: job.id, // Stripe deduplicates requests with the same key
});
}

At-least-once is the standard delivery guarantee in RabbitMQ and most job queue systems. Design your consumers to be idempotent, and at-least-once becomes safe and reliable.

Exactly-once delivery

Each message is delivered to a consumer exactly one time — no duplicates, no losses. This sounds ideal, but it's the hardest and most expensive guarantee to provide. In practice, most systems layer at-least-once delivery with idempotent consumers to achieve the same safety at much lower cost.

The practical rule: Design consumers to be idempotent. Use at-least-once delivery. Sleep well.


Part 6: The Broker Landscape — Your Options at a Glance

This module focuses on RabbitMQ in depth. But you'll make better decisions if you know what else exists. Here's a plain-language overview:

BullMQ

What it is: A job queue for Node.js built on top of Redis (a fast in-memory database).

The one-sentence pitch: If your team writes Node.js and you already have Redis, BullMQ gives you a production-ready job queue in under an hour — with retries, delayed jobs, priorities, and a monitoring UI — without running a separate broker program.

When it fits: You're building Node.js background jobs (sending emails, resizing images, generating reports) and your volume is moderate. It's the fastest path from zero to working.

When it doesn't fit: Your workers are written in Python, Go, or Java. You need complex routing logic (route this message to email service, but not to analytics). You need RabbitMQ's advanced guarantees.

RabbitMQ

What it is: A full-featured message broker that implements a protocol called AMQP (Advanced Message Queuing Protocol). Don't worry about AMQP yet — we'll explain it in depth in the next article.

The one-sentence pitch: RabbitMQ is the go-to choice when you need messages routed to different services based on rules, when your consumers are written in different programming languages, or when you need the most battle-tested reliability guarantees.

When it fits: Multi-service architectures where different teams run different languages. Complex routing (send this event to the email service AND the analytics service, but only if it's a "purchase" event). Enterprise environments with strict reliability requirements.

When it doesn't fit: Simple Node.js job queues where BullMQ would work fine. You don't want to run and operate a separate broker process.

Apache Kafka

What it is: A distributed event streaming platform. Unlike the other options, Kafka doesn't delete messages after they're consumed — it stores them in a log for a configurable period (days or weeks). Multiple different consumer groups can each read the same messages independently at their own pace.

The one-sentence pitch: Kafka is the right choice when multiple independent systems need to read the same events — including events that happened in the past. It's built for high-volume event streaming, not job queues.

When it fits: You're building an analytics pipeline, an audit log, or an event sourcing architecture. Multiple teams each need to process the same stream of events independently. You're dealing with millions of events per second.

When it doesn't fit: You just need background jobs. Kafka is significantly more complex to operate than RabbitMQ or BullMQ, and it introduces concepts (partitions, consumer groups, offsets) that require real expertise to get right.

Inngest

What it is: A managed platform where you define durable workflow functions in your code, and Inngest handles running them reliably — including retries, delays, and multi-step orchestration — without you running any broker infrastructure.

The one-sentence pitch: Inngest is the easiest way to write reliable multi-step background workflows (send email → wait 3 days → send follow-up → if no reply, notify sales team) without operating any infrastructure.

When it fits: You want zero infrastructure overhead. Your workflows involve multiple steps where each step should be independently retried on failure. You're deploying to serverless environments (Vercel, Cloudflare Workers).

When it doesn't fit: Your workers run inside a private network and can't receive HTTP requests. Your volume is so high that per-step pricing becomes expensive.

Which should you choose?

Here's the simplest decision guide:

Your team writes Node.js AND you have Redis?
→ Start with BullMQ

Your workers are in multiple languages, OR you need complex routing?
→ Use RabbitMQ

Multiple independent systems need to read the same events, including past ones?
→ Use Kafka

You want zero infrastructure and like the step-function model?
→ Use Inngest

This module teaches RabbitMQ from the ground up — because it's the most versatile option, and understanding RabbitMQ deeply makes every other queue system easier to understand.


Part 7: Why Not Build Your Own? (Revisited)

Earlier we saw how the homemade array fails. But what about a more serious DIY attempt — using your existing PostgreSQL database as a queue?

This is actually a legitimate approach for low volume, and we cover it in detail in a later article on 3rd-Party vs Custom Queues. The short answer:

  • Under ~10,000 jobs per day: A PostgreSQL queue with SELECT FOR UPDATE SKIP LOCKED works well and requires zero extra infrastructure.
  • Above that: You start fighting the database's design. PostgreSQL wasn't built for queue access patterns. You'll see table bloat, slow queries, and eventually hit scaling limits that a dedicated broker handles naturally.

A dedicated broker like RabbitMQ is purpose-built for queues. It uses data structures optimised for queue operations, handles back-pressure natively, and gives you a monitoring UI, retry policies, and dead letter queues — without any of the accidental complexity of forcing a relational database to do queue work.


Common Misconceptions

❌ Misconception: "A message queue is just async programming"

Reality: Async programming (using await in JavaScript) means your code doesn't block while waiting for a result — but it still runs in the same process. If the process crashes, all in-flight async operations are lost. A message queue is a separate system with its own storage, its own process, and its own guarantee that work will be completed even if your application crashes and restarts.

// ❌ This is NOT a queue — it's an async function call
// If the process crashes, the work is lost
app.post("/upload", async (req, res) => {
const imageKey = await saveFile(req.file);
resizeImage(imageKey).catch(console.error); // fire-and-forget — no guarantee
res.json({ success: true });
});

// ✅ This IS using a queue — the job survives independently
app.post("/upload", async (req, res) => {
const imageKey = await saveFile(req.file);
await queue.publish("image-resize", { imageKey }); // stored in broker
res.json({ success: true });
});

❌ Misconception: "Adding a queue always makes things faster"

Reality: A queue adds latency to processing — the job has to be serialized, stored in the broker, and delivered to a consumer. The benefit is not raw speed — it's resilience and decoupling. Your API responds faster (it no longer waits for processing), but the total time from upload to completed resize may be longer. That's a good trade-off for reliability, not a free speed boost.

❌ Misconception: "I'll add a real queue later when we need it"

Reality: "Later" rarely comes before "incident." By the time you notice you need durable job processing, you've built application logic around the in-memory queue's behavior, your consumers aren't idempotent, and you have no way to replay lost jobs. Starting with BullMQ or a simple PostgreSQL queue costs one afternoon. Migrating to one after a production incident costs a week.

❌ Misconception: "RabbitMQ is only for large enterprise systems"

Reality: RabbitMQ runs comfortably on a single $10/month virtual machine and powers everything from indie SaaS products to Fortune 500 companies. It's not heavy or expensive — it's well-designed. The "enterprise" label comes from its reliability story, not its hardware requirements.


Troubleshooting Common Issues

Problem: Jobs are disappearing silently in development

Symptoms: You enqueue a job, but the worker never picks it up. No errors in logs.

Common causes:

  1. Using an in-memory array as the queue — the producer and worker are in separate processes with no shared memory (90% of cases in early development)
  2. The worker process isn't running
  3. The producer and worker are connecting to different queue names

Diagnostic steps:

// Step 1: Add logging to the producer
async function enqueueResizeJob(payload: ResizeJob): Promise<void> {
console.log("[Producer] Enqueuing job:", payload.imageKey);
await queue.publish("image-resize", payload);
console.log("[Producer] Job enqueued successfully");
}

// Step 2: Add logging to the consumer
queue.consume("image-resize", async (job) => {
console.log("[Worker] Received job:", job.imageKey);
// ... process
console.log("[Worker] Job complete:", job.imageKey);
});

// Step 3: Verify both producer and consumer use exactly the same queue name
// 'image-resize' !== 'imageResize' !== 'image_resize'

Solution: Ensure producer and consumer connect to the same broker instance and use exactly the same queue name. In development, run the broker in Docker and check the monitoring UI (RabbitMQ has one built in at localhost:15672).


Problem: The same job runs twice

Symptoms: An email is sent twice, an image is resized twice, a database record is created twice.

Common causes:

  1. Consumer crashes after processing but before acknowledging (at-least-once redelivery working as designed)
  2. Two consumers are running and both grabbed the same job (missing atomic delivery)
  3. Network timeout caused the broker to redeliver a job still being processed

Diagnostic steps:

// Add a job ID and check for duplicate processing
queue.consume("image-resize", async (msg) => {
const job = JSON.parse(msg.content.toString());

// Step 1: Log the job ID and check if we've seen it before
console.log(
`[Worker] Received job ${job.id}, redelivered: ${msg.fields.redelivered}`,
);

// Step 2: Check your database for a completion record
const alreadyDone = await db.processedJobs.exists(job.id);
if (alreadyDone) {
console.log(`[Worker] Job ${job.id} already processed — skipping`);
channel.ack(msg); // ack so broker stops sending it
return;
}

// Step 3: Process and record completion atomically
await resizeImage(job);
await db.processedJobs.create({ jobId: job.id, completedAt: new Date() });
channel.ack(msg);
});

Solution: Make your consumers idempotent by tracking job completion in your database. At-least-once delivery is the standard — design for occasional duplicates from the start.


Check Your Understanding

Quick Quiz

1. What is the difference between a producer, a consumer, and a broker?

Show Answer
  • Producer — the code that creates work and puts it in the queue (your API)
  • Consumer (worker) — the code that reads from the queue and does the work (your background process)
  • Broker — the dedicated software that sits between them, storing messages, guaranteeing delivery, and providing monitoring (RabbitMQ, Kafka, etc.)

The broker is the critical piece that most people try to skip when building their first queue. Without it, you're back to an array in memory with all its failure modes.


2. Your e-commerce site needs to send an order confirmation email when a purchase is made. The user should see "Order placed successfully!" immediately. Which approach is correct, and why?

Option A: Send the email inside the POST /checkout handler, then return the response. Option B: Return the response immediately, then put a "send confirmation email" job in a queue for a background worker.

Show Answer

Option B is correct.

The user doesn't need confirmation that the email was sent — they need confirmation that their order was placed. Sending the email is background work that can (and should) happen independently.

With Option A:

  • If the email service is slow, every checkout takes longer
  • If the email service is down, every checkout returns an error — even though the order was placed successfully
  • You're blocking a server worker thread during the email round-trip

With Option B:

  • The user gets an instant response
  • The email queue retries automatically if the email service is temporarily down
  • The checkout handler only fails if the database write fails — not if the email service is having a bad day

3. What is idempotency, and why does it matter when using at-least-once message delivery?

Show Answer

Idempotency means that processing a job multiple times produces the same result as processing it once. It matters because at-least-once delivery can deliver the same message more than once — for example, if a consumer processes a job and then crashes before acknowledging it, the broker redelivers the message.

Without idempotency: a user gets two welcome emails, charged twice, or a duplicate database record is created.

With idempotency: the second delivery is detected (via a unique job ID or a database check) and skipped safely.

Practical implementation: Store a job_id in a processed_jobs table after completing each job. Before processing, check if the job ID already exists. If it does, acknowledge the message and skip.


Hands-On Exercise

The scenario: You're building a document export feature. Users can click "Export as PDF" and the system should generate a PDF of their data. PDF generation takes 10–30 seconds.

Question: Without writing any code, design the queue-based architecture for this feature. Answer these questions:

  1. Who is the producer?
  2. What does the message contain?
  3. Who is the consumer?
  4. What does the user see while the PDF is being generated?
  5. How does the user know when the PDF is ready?
Show Answer

1. Producer: The API route that handles POST /export/pdf. When the user clicks the button, the API creates a job in the queue and immediately returns.

2. Message contents:

{
"jobId": "export-789",
"userId": "u_123",
"documentId": "doc_456",
"exportFormat": "pdf",
"requestedAt": "2025-11-15T10:30:00Z"
}

3. Consumer: A background worker process (separate from the API) that reads export jobs from the queue and generates PDFs using a library like Puppeteer or a dedicated PDF service.

4. User experience while generating: The user sees a "Your PDF is being generated. We'll notify you when it's ready." message — or a "pending" status in a downloads page that polls for updates.

5. Notifying the user: The worker, after generating the PDF, can:

  • Save the PDF to cloud storage and update a database row with the download URL
  • Send a push notification or email with the download link
  • The frontend polls an API endpoint (GET /exports/789/status) until it returns "complete" with a download URL

Summary: Key Takeaways

Let's pull everything together:

  • Message queues exist to decouple fast and slow work. Your API can respond immediately while background workers handle time-consuming tasks independently.

  • Tight coupling causes cascading failures. When your API directly calls a slow service (image resizer, email provider), that service's problems become your API's problems. A queue breaks that dependency.

  • An array in memory is not a queue. It fails silently on restart, breaks with multiple workers, loses jobs on crashes, and provides no visibility. A real queue needs persistence, atomic delivery, acknowledgement, back-pressure, and observability.

  • A broker is the dedicated software that provides all of those guarantees. RabbitMQ, Kafka, and BullMQ are all brokers in this sense.

  • At-least-once delivery is the practical standard. Design your consumers to be idempotent (processing the same job twice is safe), and at-least-once becomes reliable and predictable.

  • Pick your tool based on your actual requirements:

    • Node.js + Redis already available → BullMQ
    • Multi-language consumers or complex routing → RabbitMQ
    • Multiple independent consumer groups reading past events → Kafka
    • Multi-step workflows, zero infrastructure → Inngest

What's Next?

You now understand why message queues exist, what makes them different from a homemade array, and how the major options compare.

The next article, How Message Queues Work: Internals and Queue Types, opens the hood on what's actually happening inside a queue — the data structures, how acknowledgement works as a state machine, what "persistence" really means, and the five different types of queues (FIFO, priority, delay, circular, dead letter) and when to use each. Understanding the internals is what separates someone who uses a queue from someone who can debug it.


References