Common Stages

While there are dozens of aggregation stages, you will spend 90% of your time using just five of them. Mastering these “Big 5” is the key to becoming proficient.

Analogy: The Factory Assembly Line Think of the Aggregation Pipeline as a car factory assembly line. Your raw data (raw materials) enters the factory. At each “stage” (workstation), a specific operation is performed—filtering out bad parts ($match), grouping components together ($group), or reshaping the final product ($project). The output of one stage becomes the input for the next, until the final result is ready.

1. $match (Filter)

Analogy: Think of $match as a strict bouncer at a club—it checks IDs at the door and only lets the right documents into the rest of the pipeline.

The $match stage filters documents to pass only those that match the specified condition(s). It is the Aggregation equivalent of the find() command or SQL WHERE clause.

// Filter for active users over 21
{
  $match: {
    status: "active",
    age: { $gt: 21 }
  }
}

[!IMPORTANT] Performance Rule #1: Always place $match as early as possible (ideally first).

  1. It can use indexes to find documents efficiently.
  2. It reduces the number of documents subsequent stages have to process.

Index Scan vs. Collection Scan

Index Scan (Good)

Checking Index B-Tree...
Fast ⚡

Collection Scan (Bad)

Scanning every document...
Slow 🐢

2. $group (Aggregate)

Analogy: Think of $group as sorting a massive pile of loose coins into separate buckets for quarters, dimes, and nickels, and then counting the total value in each bucket.

The $group stage groups input documents by a specified _id expression and applies accumulators to each group. This is your SQL GROUP BY.

The _id Field

The _id field is mandatory. It determines the “bucket” that documents fall into.

  • _id: "$category": Group by category field.
  • _id: { region: "$region", year: "$year" }: Group by region AND year.
  • _id: null: Group all documents into one single bucket (useful for global totals).

Accumulators

You can calculate values for each group using accumulators:

  • $sum: Adds numeric values (or counts documents if you use $sum: 1).
  • $avg: Calculates the average.
  • $min / $max: Finds extreme values.
  • $push: Creates an array of values from the group.
  • $addToSet: Creates an array of unique values.
{
  $group: {
    _id: "$department",          // Group by department
    totalBudget: { $sum: "$budget" }, // Sum budget
    avgSalary: { $avg: "$salary" },   // Average salary
    employees: { $push: "$name" }     // List of employee names
  }
}

3. Interactive: $group Bucket Visualizer

Watch how raw items are sorted into buckets based on the grouping key.

Input Stream

Buckets

4. $project (Reshape)

Analogy: Think of $project as a packaging department. It takes the raw product, removes unnecessary wrapping, slaps a new label on it, and sends out a polished final item.

The $project stage passes along the documents with the requested fields to the next stage. It can:

  1. Select fields (like SQL SELECT).
  2. Rename fields.
  3. Compute new fields using expressions.
  4. Hide sensitive fields (e.g., exclude password).
{
  $project: {
    _id: 0,                   // Exclude _id
    fullName: "$name",        // Rename 'name' to 'fullName'
    status: 1,                // Include 'status'
    isAdult: { $gte: ["$age", 18] } // Compute boolean field
  }
}

5. $unwind (Expand)

Analogy: Think of $unwind as unboxing a multi-pack. If a document is a 6-pack of soda (an array), $unwind breaks it open and sends 6 individual cans down the conveyor belt.

$unwind is unique to document databases. It deals with arrays. It “deconstructs” an array field from the input documents to output a document for each element.

Example: Input: { id: 1, tags: ["A", "B"] } Output:

  1. { id: 1, tags: "A" }
  2. { id: 1, tags: "B" }
{ tags: ["A", "B"] }
1 Document
$unwind
{ tags: "A" }
{ tags: "B" }
2 Documents

This is crucial when you want to group or filter by individual array elements.

6. $sort (Order)

Analogy: Think of $sort as a mailroom clerk organizing packages by zip code before they go out for delivery.

The $sort stage reorders the document stream.

  • 1: Ascending (A-Z, 0-9)
  • -1: Descending (Z-A, 9-0)
{ $sort: { age: -1, name: 1 } } // Sort by age desc, then name asc

[!WARNING] Memory Limit Alert: $sort is a blocking stage. If you are sorting a large number of documents (more than 100MB of data), the query will fail unless you:

  1. Use { allowDiskUse: true } (slower, writes to temporary files).
  2. Ensure the sort is covered by an index and placed early in the pipeline (preferred).