Advanced Patterns

Once you’ve mastered the basics, the Aggregation Framework opens up a world of complex data analysis capabilities, including joins, histograms, and multi-faceted search.

1. $lookup (Left Outer Join)

MongoDB is a document database, so we usually encourage embedding data. However, there are times when you need to reference data across collections. $lookup performs a Left Outer Join to another collection in the same database.

{
  $lookup: {
    from: "orders",           // Target collection
    localField: "_id",        // Field in THIS collection (users)
    foreignField: "userId",   // Field in TARGET collection (orders)
    as: "orderHistory"        // Output array field name
  }
}

[!NOTE] The result of $lookup is always an array, even if only one document matches. You often need to $unwind it if you want to merge the fields.

Users (from)

{ _id: 1, name: "Alice" }
+
Left Outer Join

Orders (target)

{ id: 101, user_id: 1 }
{ id: 102, user_id: 1 }
{ id: 103, user_id: 2 }
// Result Document
{   "_id": 1,   "name": "Alice",   "orderHistory": [     { "id": 101, "user_id": 1 },     { "id": 102, "user_id": 1 }   ] }

2. $bucket (Histograms)

Grouping by exact values is great, but sometimes you want to group by ranges. $bucket automatically categorizes data into ranges, perfect for histograms.

{
  $bucket: {
    groupBy: "$age",           // Field to group by
    boundaries: [0, 18, 30, 50, 80], // Define ranges: 0-17, 18-29, 30-49, 50-79
    default: "Other",          // Where to put outliers (80+)
    output: {
      count: { $sum: 1 },
      names: { $push: "$name" }
    }
  }
}

Histogram Visualizer

2
0-18
6
18-30
4
30-50
1
50-80

3. $facet (Multi-Pipeline)

$facet is a game-changer for dashboards. It allows you to run multiple parallel aggregations on the same input dataset within a single query.

Imagine loading a product search page. You need:

  1. The list of products (paginated).
  2. The total count of products.
  3. A breakdown of products by category (for sidebar filters).

With $facet, you do this in one go:

{
  $facet: {
    // Pipeline 1: Get actual data
    "products": [
      { $match: { price: { $lt: 100 } } },
      { $skip: 0 },
      { $limit: 10 }
    ],
    // Pipeline 2: Get stats
    "stats": [
      { $match: { price: { $lt: 100 } } },
      { $group: { _id: null, avgPrice: { $avg: "$price" } } }
    ]
  }
}
📄
Input Docs
"products" pipeline
$match $skip $limit
"stats" pipeline
$match $group

4. Conditional Logic ($cond)

You can use conditional logic inside $project to create dynamic fields. It works like a ternary operator (if ? then : else).

{
  $project: {
    status: {
      $cond: {
        if: { $gte: ["$quantity", 10] },
        then: "In Stock",
        else: "Low Stock"
      }
    }
  }
}

5. Performance Pitfalls

With great power comes great responsibility. Watch out for these common issues:

  • The Cartesian Product: If you $unwind a large array, you multiply the number of documents in your pipeline. 100 documents with an array of 100 items each becomes 10,000 documents!
  • $lookup on Unindexed Fields: Always ensure the foreignField in your $lookup is indexed. Otherwise, MongoDB has to scan the entire target collection for every input document.
  • Memory Limits: Remember the 100MB limit for blocking stages. Use indexes to avoid sorting in memory whenever possible.