Module Review: Aggregation
[!NOTE] This module explores the core principles of Module Review: Aggregation, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Key Takeaways
- Pipelines: Data processing in MongoDB is done via a pipeline of stages, processed in order.
- Filter First: Always use
$matchas early as possible to utilize indexes and reduce data volume. - The Big 5: Master
$match,$group,$project,$unwind, and$sortto handle 90% of use cases. - Blocking vs Streaming: Be aware that
$sortand$groupblock execution until all data is received, subject to a 100MB memory limit (unlessallowDiskUse: true). - Advanced Features: Use
$lookupfor joins,$bucketfor histograms, and$facetfor multi-pipeline dashboards.
2. Interactive Flashcards
Test your knowledge by clicking on the cards to reveal the answers.
Why should you place `$match` at the start of a pipeline?
1. To utilize indexes (performance).
2. To reduce the number of documents subsequent stages need to process.
Which stage is used to deconstruct an array field into multiple documents?
`$unwind`
What is the default memory limit for blocking stages like `$sort`?
100MB. If exceeded, the query fails unless `{ allowDiskUse: true }` is specified.
How do you perform a Left Outer Join in MongoDB?
Using the `$lookup` stage.
What does `$facet` allow you to do?
It allows you to run multiple aggregation pipelines in parallel on the same set of input documents (great for dashboards).
What is the risk of using `$lookup` on a non-indexed foreign field?
MongoDB must perform a full collection scan on the target collection for *every* input document, which is extremely slow.
What is the difference between `$bucket` and `$bucketAuto`?
`$bucket` requires you to define boundaries manually. `$bucketAuto` automatically determines boundaries to evenly distribute documents.
How does the optimizer optimize a sequence of `$sort` followed by `$match`?
It reorders them to `$match` first (to reduce dataset) and then `$sort`.
3. Cheat Sheet: SQL vs Aggregation
If you are coming from a Relational Database background, use this mapping.
| SQL Concept | Aggregation Stage | Description |
|---|---|---|
WHERE |
$match |
Filter documents |
GROUP BY |
$group |
Group documents |
HAVING |
$match |
Filter groups (place after $group) |
SELECT |
$project |
Pick/rename fields |
ORDER BY |
$sort |
Sort results |
LIMIT |
$limit |
Limit number of results |
OFFSET |
$skip |
Skip results |
JOIN |
$lookup |
Left outer join |
UNION ALL |
$unionWith |
Combine two collections |
4. Glossary
For definitions of terms like Accumulator, Pipeline, and Cursor, check out the MongoDB Glossary.
5. Next Steps
Now that you’ve mastered the aggregation pipeline, it’s time to learn how to make your queries lightning fast with Indexing.