Streams: Processing Data Piece by Piece
When you're working with large files, network requests, or any substantial amount of data, loading everything into memory at once can slow down your application—or worse, crash it entirely. Streams solve this problem by letting you process data incrementally, piece by piece, as it becomes available. Instead of waiting for a massive file to load completely, you can start working with the first chunk while the rest is still being read.
Think of it like watching a video online. You don't wait for the entire movie to download before you start watching—the video streams to your device, and you begin watching within seconds while the rest continues to load in the background. That's the power of streams: immediate processing without memory overload.
What You Need to Know First
This is a foundational article about streams—you don't need prior experience with Node.js streams to understand it.
However, you should be comfortable with:
- JavaScript basics: Variables, functions, and how to write basic code
- Asynchronous JavaScript: Understanding callbacks, Promises, or async/await (we'll show examples of each)
- Node.js environment: How to run JavaScript files using Node.js
If you're new to asynchronous JavaScript, we recommend getting familiar with Promises and async/await first, as streams work asynchronously.
What We'll Cover in This Article
By the end of this guide, you'll understand:
- What streams are and why they're essential for efficient data processing
- The real-world problem streams solve (memory efficiency)
- The four types of streams in Node.js
- When to use each type of stream
- How streams differ from loading all data at once
What We'll Explain Along the Way
Don't worry if you're unfamiliar with these—we'll explain them as we go:
- Chunks (small pieces of data)
- Backpressure (when data flows too fast)
- Piping (connecting streams together)
- Buffers (temporary data storage)
- Memory management concepts
What Is a Stream?
A stream is a sequence of data that flows continuously from one place to another. Instead of loading an entire file or dataset into memory all at once, streams let you work with data in small, manageable pieces called chunks.
The Core Problem Streams Solve
Let's say you need to process a 2GB video file. If you try to load the entire file into memory, here's what happens:
// ❌ Loading everything at once (Bad approach)
import { readFileSync } from "fs";
// This tries to load the ENTIRE 2GB file into memory
const hugeFile = readFileSync("video.mp4");
// Problems:
// 1. Your application might crash if it runs out of memory
// 2. User has to wait for the entire file to load before anything happens
// 3. If you only need to process part of the file, you've wasted time and memory
console.log("File size:", hugeFile.length); // 2,000,000,000 bytes in memory!
What went wrong?
- Memory overload: The entire 2GB file is loaded into RAM
- Long wait time: Nothing happens until the complete file is loaded
- Inefficient: Even if you only need the first 5 minutes of the video, you loaded all 2 hours
Now, here's the stream approach:
// ✅ Using streams (Better approach)
import { createReadStream } from "fs";
// Create a stream that reads the file in small chunks
const fileStream = createReadStream("video.mp4");
// Process data as it arrives, chunk by chunk
fileStream.on("data", (chunk: Buffer) => {
// Each chunk is typically 64KB (65,536 bytes)
console.log(`Processing ${chunk.length} bytes`);
// You can start working with this data immediately
// while the rest of the file is still being read
});
// Benefits:
// 1. Memory usage stays low (only one chunk at a time)
// 2. Processing starts immediately
// 3. You can stop reading if you find what you need
What's better here?
- Low memory footprint: Only a small chunk (typically 64KB) is in memory at any time
- Immediate processing: Work begins as soon as the first chunk arrives
- Flexibility: You can stop reading early if you've found what you need
Understanding Streams Through Analogy
Let's use a real-world analogy to make streams crystal clear.
The Water Tank Analogy
Imagine you need to transfer 10,000 liters of water from one tank to another:
Without Streams (All at Once):
Tank A (10,000L) ──→ [Trying to carry ALL water at once] ──→ Tank B
Problems:
- You'd need a massive container to hold all 10,000 liters
- You can't start filling Tank B until you've collected ALL the water
- If you drop the container, you lose everything
- Physically impossible and inefficient
With Streams (Piece by Piece):
Tank A ──→ [Hose flowing steadily] ──→ Tank B
↑
Water flows continuously
in a steady stream
Benefits:
- Tank B starts filling immediately
- You only need the hose (minimal resources)
- If something goes wrong, you've only lost a small amount
- You can control the flow rate
- Both tanks can operate simultaneously
This is exactly how data streams work:
- Tank A = Source (file, network, database)
- Hose = Stream (the connection)
- Water flow = Data chunks moving continuously
- Tank B = Destination (another file, network, database)
Practical Example: Reading a Large Log File
Let's see a complete, real-world example:
// Scenario: You need to find a specific error in a 500MB log file
import { createReadStream } from "fs";
import { once } from "events";
// ❌ Bad approach: Load entire file
function findErrorWithoutStream(filepath: string): void {
const fs = require("fs");
console.time("Without stream");
// Loads entire 500MB into memory
const entireFile = fs.readFileSync(filepath, "utf-8");
// Now search through it
if (entireFile.includes("ERROR: Database connection failed")) {
console.log("Found the error!");
}
console.timeEnd("Without stream");
// Typical result: 2-3 seconds, 500MB memory used
}
// ✅ Good approach: Use stream
async function findErrorWithStream(filepath: string): Promise<void> {
console.time("With stream");
const stream = createReadStream(filepath, {
encoding: "utf-8",
highWaterMark: 64 * 1024, // Read 64KB at a time
});
let foundError = false;
// Process each chunk as it arrives
stream.on("data", (chunk: string) => {
if (chunk.includes("ERROR: Database connection failed")) {
console.log("Found the error!");
foundError = true;
// Stop reading the rest of the file
stream.destroy();
}
});
// Wait for stream to finish
await once(stream, "end");
console.timeEnd("With stream");
// Typical result: 0.1-0.5 seconds, only ~64KB memory used
}
// Usage:
findErrorWithStream("./app.log");
// Why is the stream version faster?
// 1. Starts processing immediately (no wait for full file)
// 2. Stops as soon as error is found (doesn't read unnecessary data)
// 3. Uses minimal memory (only one 64KB chunk at a time)
// 4. The error might be near the beginning of the file
Breaking Down What Happens:
- Stream starts: Opens the file and begins reading the first 64KB chunk
- First chunk arrives: You start searching immediately (no waiting)
- Error found: Let's say the error is in chunk 5 (320KB into the file)
- Stream stops: You call
stream.destroy()and stop reading - Result: You've only read 320KB instead of 500MB
Memory comparison:
- Without stream: 500MB loaded into memory
- With stream: 64KB in memory at any given time (7,812x less memory!)
The Four Types of Streams in Node.js
Node.js provides four distinct types of streams, each designed for specific use cases. Let's explore each one in detail.
1. Readable Streams: Getting Data From a Source
Readable streams let you read data from a source such as a file, HTTP request, or user input.
What They Do:
- Pull data from somewhere (file system, network, database)
- Deliver data to your application in chunks
- You consume the data (read it)
Real-World Examples:
- Reading a file from disk
- Receiving data from an HTTP request
- Reading user input from the terminal
- Fetching data from a database cursor
import { createReadStream } from "fs";
// Example 1: Reading a text file
const fileReader = createReadStream("./document.txt", {
encoding: "utf-8", // Convert bytes to text
highWaterMark: 16 * 1024, // Read 16KB chunks
});
// Event: 'data' - Fires each time a chunk arrives
fileReader.on("data", (chunk: string) => {
console.log("Received chunk:", chunk.substring(0, 50) + "...");
console.log("Chunk size:", chunk.length, "characters");
});
// Event: 'end' - Fires when all data has been read
fileReader.on("end", () => {
console.log("Finished reading file");
});
// Event: 'error' - Fires if something goes wrong
fileReader.on("error", (error: Error) => {
console.error("Error reading file:", error.message);
});
// What happens behind the scenes:
// 1. Node.js opens the file
// 2. Reads the first 16KB → triggers 'data' event
// 3. Reads the next 16KB → triggers 'data' event again
// 4. Continues until file ends → triggers 'end' event
// 5. Closes the file automatically
Example 2: Receiving Data from HTTP Request
import http from "http";
const server = http.createServer((req, res) => {
// req is a Readable stream!
let bodyData = "";
// Read the request body chunk by chunk
req.on("data", (chunk: Buffer) => {
// Convert buffer to string and accumulate
bodyData += chunk.toString("utf-8");
console.log(`Received ${chunk.length} bytes`);
});
req.on("end", () => {
// All data received
console.log("Complete request body:", bodyData);
res.end("Received your data!");
});
});
server.listen(3000);
// When a client sends data:
// POST /api/data
// Body: { "user": "John", "message": "Hello" }
//
// The stream receives it in chunks:
// Chunk 1: { "user": "J
// Chunk 2: ohn", "message
// Chunk 3: ": "Hello" }
2. Writable Streams: Sending Data to a Destination
Writable streams let you write data to a destination such as a file, HTTP response, or database.
What They Do:
- Accept data from your application
- Write data to a destination (file, network, database)
- You produce the data (write it)
Real-World Examples:
- Writing to a file on disk
- Sending an HTTP response to a client
- Writing to a database
- Logging to a file or console
import { createWriteStream } from "fs";
// Example 1: Writing to a file
const fileWriter = createWriteStream("./output.txt", {
encoding: "utf-8",
});
// Write data to the stream
fileWriter.write("First line of text\n");
fileWriter.write("Second line of text\n");
fileWriter.write("Third line of text\n");
// Important: Signal that you're done writing
fileWriter.end("Final line of text\n");
// Event: 'finish' - Fires when all data has been written
fileWriter.on("finish", () => {
console.log("All data written to file");
});
// Event: 'error' - Fires if writing fails
fileWriter.on("error", (error: Error) => {
console.error("Error writing file:", error.message);
});
// What happens:
// 1. Node.js opens/creates the file
// 2. Writes "First line of text\n"
// 3. Writes "Second line of text\n"
// 4. Writes "Third line of text\n"
// 5. Writes "Final line of text\n" and closes the file
// 6. Triggers 'finish' event
Example 2: Streaming Data to HTTP Response
import http from "http";
import { createReadStream } from "fs";
const server = http.createServer((req, res) => {
// res is a Writable stream!
// Set headers
res.writeHead(200, { "Content-Type": "text/plain" });
// Send data in chunks
res.write("Starting data transfer...\n");
// Simulate processing and sending data over time
let count = 0;
const interval = setInterval(() => {
res.write(`Chunk ${++count}\n`);
if (count === 5) {
clearInterval(interval);
res.end("Transfer complete!\n");
}
}, 1000);
});
server.listen(3000);
// Client receives:
// Starting data transfer...
// Chunk 1 (after 1 second)
// Chunk 2 (after 2 seconds)
// Chunk 3 (after 3 seconds)
// Chunk 4 (after 4 seconds)
// Chunk 5 (after 5 seconds)
// Transfer complete!
3. Duplex Streams: Two-Way Communication
Duplex streams can both read and write data. They're like a telephone line—you can talk (write) and listen (read) at the same time.
What They Do:
- Read data from a source AND write data to a destination
- Both operations happen independently
- Useful for bidirectional communication
Real-World Examples:
- Network sockets (TCP connections)
- WebSocket connections
- Bidirectional IPC (Inter-Process Communication)
import { Duplex } from "stream";
// Example: Creating a custom Duplex stream
class SimpleTransform extends Duplex {
private buffer: string = "";
// Reading side: Provide data when requested
_read(size: number): void {
// Push some data to be read
if (this.buffer) {
this.push(this.buffer);
this.buffer = "";
} else {
this.push(null); // Signal end of data
}
}
// Writing side: Accept data being written
_write(chunk: Buffer, encoding: string, callback: Function): void {
// Store the written data
this.buffer += chunk.toString().toUpperCase();
console.log("Received:", chunk.toString());
callback(); // Signal write completed successfully
}
}
// Usage:
const duplexStream = new SimpleTransform();
// Write data to it
duplexStream.write("hello ");
duplexStream.write("world");
duplexStream.end();
// Read data from it
duplexStream.on("data", (chunk: Buffer) => {
console.log("Reading:", chunk.toString());
// Output: "HELLO WORLD"
});
// How it works:
// 1. You write "hello " → Stream stores it as "HELLO "
// 2. You write "world" → Stream stores it as "WORLD"
// 3. You call end() → Stream finishes writing
// 4. You read from it → Stream returns "HELLO WORLD"
Example 2: TCP Socket (Real Duplex Stream)
import net from "net";
// Create a TCP server
const server = net.createServer((socket) => {
// socket is a Duplex stream!
console.log("Client connected");
// Reading from the socket
socket.on("data", (data: Buffer) => {
const message = data.toString();
console.log("Received from client:", message);
// Writing to the socket
socket.write(`Server received: ${message}\n`);
});
socket.on("end", () => {
console.log("Client disconnected");
});
});
server.listen(8080, () => {
console.log("Server listening on port 8080");
});
// Client side:
const client = net.connect({ port: 8080 }, () => {
console.log("Connected to server");
// Write to server (socket is Duplex)
client.write("Hello, server!");
});
// Read from server
client.on("data", (data: Buffer) => {
console.log("Received from server:", data.toString());
});
// Communication flow:
// Client → Write "Hello, server!" → Server
// Client ← Read "Server received: Hello, server!" ← Server
4. Transform Streams: Modifying Data as It Flows
Transform streams are a special type of Duplex stream where the output is a transformation of the input. They read data, modify it, and write the modified data.
What They Do:
- Read data from a source
- Modify/transform the data
- Write the transformed data to a destination
Real-World Examples:
- Compressing files (ZIP, GZIP)
- Encrypting/decrypting data
- Converting data formats (CSV to JSON)
- Image resizing
- Data validation and sanitization
import { Transform } from "stream";
// Example 1: Converting text to uppercase
class UppercaseTransform extends Transform {
_transform(chunk: Buffer, encoding: string, callback: Function): void {
// Step 1: Get the input data
const input = chunk.toString();
// Step 2: Transform it
const output = input.toUpperCase();
// Step 3: Push the transformed data
this.push(output);
// Step 4: Signal we're done with this chunk
callback();
}
}
// Usage:
const uppercase = new UppercaseTransform();
// Write data to the transform stream
uppercase.write("hello ");
uppercase.write("world");
uppercase.end("!");
// Read transformed data
uppercase.on("data", (chunk: Buffer) => {
console.log(chunk.toString());
});
// Output:
// HELLO
// WORLD
// !
// Data flow:
// Input: "hello " → Transform → Output: "HELLO "
// Input: "world" → Transform → Output: "WORLD"
// Input: "!" → Transform → Output: "!"
Example 2: Compressing Data
import { createReadStream, createWriteStream } from "fs";
import { createGzip } from "zlib";
import { pipeline } from "stream/promises";
// Compress a file using a Transform stream
async function compressFile(input: string, output: string): Promise<void> {
try {
// Create streams
const source = createReadStream(input); // Readable
const gzip = createGzip(); // Transform
const destination = createWriteStream(output); // Writable
// Connect them together
await pipeline(source, gzip, destination);
console.log("File compressed successfully");
} catch (error) {
console.error("Compression failed:", error);
}
}
// Usage:
await compressFile("./large-file.txt", "./large-file.txt.gz");
// What happens:
// 1. Read chunk from large-file.txt
// 2. Pass chunk to gzip (Transform stream)
// 3. gzip compresses the chunk
// 4. Write compressed chunk to large-file.txt.gz
// 5. Repeat until entire file is processed
// If the file is 100MB:
// - Without streams: Load 100MB → compress → write (uses 200MB+ memory)
// - With streams: Process in 64KB chunks → uses only ~64KB memory
Example 3: Data Validation Transform
import { Transform } from "stream";
// Transform stream that validates and filters JSON objects
class JSONValidator extends Transform {
constructor() {
super({ objectMode: true }); // Work with objects, not buffers
}
_transform(chunk: any, encoding: string, callback: Function): void {
try {
// Validation rules
if (typeof chunk !== "object") {
// Invalid: Not an object, skip it
return callback();
}
if (!chunk.id || !chunk.name) {
// Invalid: Missing required fields, skip it
return callback();
}
if (chunk.age && chunk.age < 0) {
// Invalid: Negative age, skip it
return callback();
}
// Valid: Pass it through
this.push(chunk);
callback();
} catch (error) {
callback(error);
}
}
}
// Usage:
const validator = new JSONValidator();
// Input data (some valid, some invalid)
const data = [
{ id: 1, name: "Alice", age: 30 }, // ✅ Valid
{ id: 2, name: "Bob", age: -5 }, // ❌ Invalid age
{ id: 3 }, // ❌ Missing name
{ id: 4, name: "Charlie", age: 25 }, // ✅ Valid
"invalid data", // ❌ Not an object
{ id: 5, name: "David" }, // ✅ Valid (age optional)
];
// Write each item to the validator
data.forEach((item) => validator.write(item));
validator.end();
// Read validated data
validator.on("data", (validItem: any) => {
console.log("Valid:", validItem);
});
// Output (only valid items):
// Valid: { id: 1, name: 'Alice', age: 30 }
// Valid: { id: 4, name: 'Charlie', age: 25 }
// Valid: { id: 5, name: 'David' }
Comparing Stream Types: Side by Side
Here's a quick reference to help you choose the right stream type:
| Stream Type | Can Read? | Can Write? | Transforms Data? | Common Examples |
|---|---|---|---|---|
| Readable | ✅ Yes | ❌ No | ❌ No | Files (read), HTTP requests, Database queries |
| Writable | ❌ No | ✅ Yes | ❌ No | Files (write), HTTP responses, Database inserts |
| Duplex | ✅ Yes | ✅ Yes | ❌ No | TCP sockets, WebSockets, Network connections |
| Transform | ✅ Yes | ✅ Yes | ✅ Yes | Compression, Encryption, Data parsing |
When to Use Each Stream Type
Use Readable Streams When:
- You need to read data from a source
- Processing large files that don't fit in memory
- Receiving data over a network
- Reading database results row by row
- Monitoring log files in real-time
Use Writable Streams When:
- You need to write data to a destination
- Creating or appending to files
- Sending HTTP responses
- Writing to databases
- Logging data continuously
Use Duplex Streams When:
- You need bidirectional communication
- Working with network sockets
- Building real-time chat applications
- Implementing custom protocols
- Inter-process communication
Use Transform Streams When:
- You need to modify data as it flows
- Compressing or encrypting files
- Converting between formats
- Validating or filtering data
- Modifying images or videos
- Text processing (uppercase, lowercase, replacements)
Practical Benefits of Streams
Let's quantify the real-world benefits with concrete examples:
1. Memory Efficiency
// Scenario: Processing a 1GB CSV file
// ❌ Without streams: Load everything into memory
import { readFileSync } from "fs";
const entireFile = readFileSync("data.csv", "utf-8"); // Uses 1GB RAM
const lines = entireFile.split("\n");
// Memory usage: 1GB + (array overhead)
// ✅ With streams: Process line by line
import { createReadStream } from "fs";
import { createInterface } from "readline";
const stream = createReadStream("data.csv");
const rl = createInterface({ input: stream });
rl.on("line", (line: string) => {
// Process one line at a time
// Memory usage: ~few KB per line
});
// Result:
// Without streams: 1GB+ memory
// With streams: ~64KB memory (buffer size)
// That's a 16,000x reduction!
2. Time to First Byte (Performance)
// Scenario: Serving a large video file to a user
// ❌ Without streams: Load then send
app.get("/video", (req, res) => {
const video = fs.readFileSync("movie.mp4"); // Wait for 2GB to load
res.send(video); // User waits ~10 seconds
});
// ✅ With streams: Start sending immediately
app.get("/video", (req, res) => {
const stream = fs.createReadStream("movie.mp4");
stream.pipe(res); // User sees video in ~0.1 seconds
});
// Result:
// Without streams: 10+ second wait
// With streams: Video starts playing almost immediately
3. Scalability
// Scenario: Server handling multiple large file requests
// ❌ Without streams
// 10 users × 100MB per file = 1GB memory
// 100 users × 100MB per file = 10GB memory (server crashes!)
// ✅ With streams
// 10 users × 64KB buffer = 640KB memory
// 100 users × 64KB buffer = 6.4MB memory
// 1000 users × 64KB buffer = 64MB memory (no problem!)
// Result: Handle 100x more concurrent users with streams
Common Pitfalls and Solutions
Pitfall 1: Forgetting to Handle Errors
// ❌ Bad: No error handling
const stream = createReadStream("file.txt");
stream.on("data", (chunk) => {
console.log(chunk);
});
// If file doesn't exist, your application crashes!
// ✅ Good: Always handle errors
const stream = createReadStream("file.txt");
stream.on("data", (chunk) => {
console.log(chunk);
});
stream.on("error", (error) => {
console.error("Stream error:", error.message);
// Your app continues running
});
Pitfall 2: Not Ending Writable Streams
// ❌ Bad: Forgot to call end()
const writer = createWriteStream("output.txt");
writer.write("Some data");
// File might not be written completely!
// ✅ Good: Always call end()
const writer = createWriteStream("output.txt");
writer.write("Some data");
writer.end(); // Ensures data is flushed and file is closed
writer.on("finish", () => {
console.log("File written successfully");
});
Pitfall 3: Ignoring Backpressure
// ❌ Bad: Writing without checking if the stream can handle it
const writer = createWriteStream("output.txt");
for (let i = 0; i < 1000000; i++) {
writer.write(`Line ${i}\n`); // Might overwhelm the stream
}
// ✅ Good: Respect backpressure
const writer = createWriteStream("output.txt");
function writeData(i: number): void {
let ok = true;
while (i < 1000000 && ok) {
ok = writer.write(`Line ${i}\n`);
i++;
}
if (i < 1000000) {
// Stream buffer is full, wait for it to drain
writer.once("drain", () => {
writeData(i);
});
} else {
writer.end();
}
}
writeData(0);
// Explanation:
// - writer.write() returns false when buffer is full
// - Wait for 'drain' event before writing more
// - This prevents memory overflow
Summary: Key Takeaways
Let's recap what you've learned about streams:
- Streams process data incrementally instead of loading everything into memory at once
- Memory efficiency: Streams use minimal memory (typically 64KB) regardless of file size
- Immediate processing: Start working with data as soon as the first chunk arrives
- Four types of streams:
- Readable: Read from a source (files, HTTP requests)
- Writable: Write to a destination (files, HTTP responses)
- Duplex: Both read and write (network sockets)
- Transform: Modify data as it flows (compression, encryption)
- Real-world benefits: Handle 100x more concurrent users, 16,000x less memory, 100x faster time-to-first-byte
- Always handle errors to prevent crashes
- Call
.end()on writable streams to ensure data is written - Respect backpressure to avoid memory overflow
What's Next?
Now that you understand the fundamentals of streams, you're ready to dive deeper:
- Node.js Readable Streams: Learn how to create custom readable streams and control data flow
- Node.js Writable Streams: Master writing data efficiently with backpressure handling
- Stream Piping: Connect streams together to create data processing pipelines
- Transform Streams: Build custom data transformers for your specific needs
- Stream Best Practices: Advanced patterns for production applications
Streams are one of the most powerful features in Node.js for building efficient, scalable applications. The concepts you learned here will serve as the foundation for working with files, networks, and data processing throughout your development career.