Module Review: Data Modeling

[!NOTE] This module explores the core principles of Module Review: Data Modeling, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Key Takeaways

  • Access Patterns > Normalization: Design your schema based on how the application queries the data, not just the relationships.
  • The Physics of I/O: Embedding data reduces disk seeks (Sequential I/O), which is orders of magnitude faster than joining scattered documents (Random I/O).
  • The 16MB Limit: The hard limit on BSON document size forces you to Reference data when the relationship cardinality is unbounded (One-to-Squillions).
  • Write Amplification: Updating a small field in a large embedded document requires rewriting the entire document on disk. Be mindful of update-heavy workloads.
  • Schema Validation: Use JSON Schema to enforce data integrity while maintaining flexibility where needed.

2. Flashcards

Test your retention of the core concepts. Click to flip.

What is the "Golden Rule" of NoSQL modeling?
"Data that is accessed together should be stored together."
What is the maximum size of a BSON document?
16 Megabytes.
When should you use Referencing instead of Embedding?
1. When the relationship is Unbounded (1:Squillions). 2. When the child data is accessed frequently without the parent.
What is "Write Amplification"?
The cost of rewriting an entire BSON document to disk even if you only update a single byte.
What is the "Subset Pattern"?
Embedding a small subset (e.g., top 5 reviews) for fast reads, while referencing the full dataset for completeness.
True or False: MongoDB supports ACID transactions.
True (since v4.0). But they come with a performance cost and shouldn't be the default.

3. Cheat Sheet: Modeling Decisions

Scenario Strategy Reason
User & Profile Embed 1:1 Relationship. Almost always accessed together.
Post & Tags Embed 1:Few. Array is small and bounded.
Post & Comments Ref (or Bucket) 1:Many. Comments can grow indefinitely.
IoT Sensor & Readings Ref (Time-Series) 1:Squillions. Massive volume, specialized query patterns.
Catalog & Products Ref Products are entities in their own right.
Product & Top Reviews Subset Pattern Embed the top 5 for speed, reference the rest.

4. Quick Revision

  • Schema Design: Did you model for the application’s most frequent queries?
  • 16MB Check: Are any of your arrays unbounded? If so, move to referencing.
  • Validation: Did you add schema validation rules for critical fields (email, age, status)?
  • Indexes: (Coming up next!) Have you supported your queries with indexes?

5. Next Steps

Data modeling is only half the battle. Now that your data is stored efficiently, you need to query and transform it. In the next module, we dive into the Aggregation Framework—MongoDB’s powerful data processing pipeline.

MongoDB Glossary