Review & Cheat Sheet

[!IMPORTANT] In this lesson, you will master:

  1. Quick Revision: High-level comparison of the “Elite” data systems.
  2. Database Cheat Sheet: Architectural trade-offs at a glance.
  3. Interactive Flashcards: Drilling the most common interview questions.

1. Quick Revision

For a full list of terms used in this module and course, check out the System Design Glossary.

  • Dynamo: The father of NoSQL. Prioritized Availability (AP). Introduced Consistent Hashing, Vector Clocks, and Gossip.
  • Cassandra: The hybrid. BigTable Data Model (Wide Column) + Dynamo Architecture (Ring). Optimized for Writes (LSM Trees).
  • BigTable: The structured map. Master-Slave architecture. Used for Google Search/Maps. Scalable to Petabytes on GFS.
  • MapReduce: Distributed computing. Map (Parallel Processing) → Shuffle (Network Grouping) → Reduce (Aggregation).
  • Bloom Filters: Probabilistic membership. Used to skip expensive Disk Reads. No False Negatives.
  • Snowflake: Cloud Data Warehouse. Separation of Storage and Compute. Compute Isolation for concurrent workloads.

2. Cheat Sheet: Database Comparison

Feature Dynamo Cassandra BigTable Snowflake
Data Model Key-Value Wide Column Wide Column Columnar
Architecture P2P Ring P2P Ring Master-Slave Shared-Data
Consistency Eventual (AP) Tunable Strong (CP) Strong
Storage Local Disk LSM (SSTable) SSTable (GFS) Micro-Part (S3)
Gossip? Yes Yes No (Master) No (Services)
Primary Use Shopping Cart Activity Feed Search Index Cloud DW

3. Interactive Flashcards

Test your knowledge. Click to flip.

What is a Tombstone?

(Click to reveal)

A Deletion Marker

In LSM Trees (Cassandra), you can't delete from immutable SSTables. You write a "Tombstone" to mark data as deleted. It is removed during Compaction.

Vector Clock

(Click to reveal)

Causality Tracker

A list of (Node, Counter) pairs used in Dynamo to detect conflicting updates in a distributed system. e.g., `[A:1, B:2]`.

Bloom Filter Guarantee

(Click to reveal)

No False Negatives

If a Bloom Filter says "No", the item is DEFINITELY not in the set. If it says "Yes", it MIGHT be (False Positive).

Hinted Handoff

(Click to reveal)

Temporary Storage

If a node is down, a neighbor accepts the write with a "hint" to replay it when the target node comes back online. Ensures Availability.

Snowflake Isolation

(Click to reveal)

Compute Isolation

Multiple independent Virtual Warehouses can access the same data in S3 simultaneously without performance interference.

MemTable vs SSTable

(Click to reveal)

RAM vs Disk

MemTable is the In-Memory buffer (Mutable). SSTable is the On-Disk file (Immutable). Data moves MemTable → SSTable.

Gossip Protocol

(Click to reveal)

Epidemic Failure Detection

Nodes randomly exchange state information to discover failures and membership changes without a central master.

R + W > N

(Click to reveal)

Quorum Formula

The formula to guarantee Strong Consistency in a distributed quorum system. R=Read Quorum, W=Write Quorum, N=Replication Factor.