Common Stages

While there are dozens of aggregation stages, you will spend 90% of your time using just five of them. Mastering these “Big 5” is the key to becoming proficient.

1. $match (Filter)

The $match stage filters documents to pass only those that match the specified condition(s). It is the Aggregation equivalent of the find() command or SQL WHERE clause.

// Filter for active users over 21
{
  $match: {
    status: "active",
    age: { $gt: 21 }
  }
}

[!IMPORTANT] Performance Rule #1: Always place $match as early as possible (ideally first).

  1. It can use indexes to find documents efficiently.
  2. It reduces the number of documents subsequent stages have to process.

Index Scan vs. Collection Scan

Index Scan (Good)

Checking Index B-Tree...
Fast ⚡

Collection Scan (Bad)

Scanning every document...
Slow 🐢

2. $group (Aggregate)

The $group stage groups input documents by a specified _id expression and applies accumulators to each group. This is your SQL GROUP BY.

The _id Field

The _id field is mandatory. It determines the “bucket” that documents fall into.

  • _id: "$category": Group by category field.
  • _id: { region: "$region", year: "$year" }: Group by region AND year.
  • _id: null: Group all documents into one single bucket (useful for global totals).

Accumulators

You can calculate values for each group using accumulators:

  • $sum: Adds numeric values (or counts documents if you use $sum: 1).
  • $avg: Calculates the average.
  • $min / $max: Finds extreme values.
  • $push: Creates an array of values from the group.
  • $addToSet: Creates an array of unique values.
{
  $group: {
    _id: "$department",          // Group by department
    totalBudget: { $sum: "$budget" }, // Sum budget
    avgSalary: { $avg: "$salary" },   // Average salary
    employees: { $push: "$name" }     // List of employee names
  }
}

3. Interactive: $group Bucket Visualizer

Watch how raw items are sorted into buckets based on the grouping key.

Input Stream

Buckets

4. $project (Reshape)

The $project stage passes along the documents with the requested fields to the next stage. It can:

  1. Select fields (like SQL SELECT).
  2. Rename fields.
  3. Compute new fields using expressions.
  4. Hide sensitive fields (e.g., exclude password).
{
  $project: {
    _id: 0,                   // Exclude _id
    fullName: "$name",        // Rename 'name' to 'fullName'
    status: 1,                // Include 'status'
    isAdult: { $gte: ["$age", 18] } // Compute boolean field
  }
}

5. $unwind (Expand)

$unwind is unique to document databases. It deals with arrays. It “deconstructs” an array field from the input documents to output a document for each element.

Example: Input: { id: 1, tags: ["A", "B"] } Output:

  1. { id: 1, tags: "A" }
  2. { id: 1, tags: "B" }
{ tags: ["A", "B"] }
1 Document
→ $unwind →
{ tags: "A" }
{ tags: "B" }
2 Documents

This is crucial when you want to group or filter by individual array elements.

6. $sort (Order)

The $sort stage reorders the document stream.

  • 1: Ascending (A-Z, 0-9)
  • -1: Descending (Z-A, 9-0)
{ $sort: { age: -1, name: 1 } } // Sort by age desc, then name asc

[!WARNING] Memory Limit Alert: $sort is a blocking stage. If you are sorting a large number of documents (more than 100MB of data), the query will fail unless you:

  1. Use { allowDiskUse: true } (slower, writes to temporary files).
  2. Ensure the sort is covered by an index and placed early in the pipeline (preferred).