Module Review: Architecture
Congratulations on completing the Cassandra Architecture module! You’ve dove deep into the internals of how Cassandra achieves massive scalability and high availability.
Key Takeaways
- Write Path: Writes are appended to the Commit Log (durability) and MemTable (speed). MemTables flush to immutable SSTables on disk.
- LSM Trees: Log-Structured Merge-Trees optimize for high write throughput by avoiding random disk I/O updates.
- Gossip Protocol: Nodes communicate peer-to-peer to propagate cluster state. Phi Accrual Failure Detector adaptively detects failures.
- Read Path: Reads check MemTable, Row Cache, Bloom Filters, Partition Key Cache, and finally SSTables.
- Compaction: Background process that merges SSTables to reclaim space and remove deleted data (tombstones).
Interactive Flashcards
Test your knowledge with these flashcards. Click to flip!
What is the purpose of the Commit Log?
Click to reveal
Durability
It ensures data is safe on disk in case of a node crash before the MemTable is flushed.
Why are SSTables immutable?
Click to reveal
Performance & Concurrency
Immutability enables sequential writes (fast) and simplifies read/write concurrency without complex locking.
What does a Bloom Filter do?
Click to reveal
Avoid Disk Seeks
It probabilistically checks if a partition key *might* exist in an SSTable. False positives are possible; false negatives are not.
What is Phi Accrual?
Click to reveal
Failure Detection
An adaptive failure detection algorithm that calculates the suspicion level (φ) of a node failure based on heartbeat history.
What happens during Compaction?
Click to reveal
Merge & Purge
Multiple SSTables are merged into one. Tombstones are processed, deleted data is removed, and old versions of rows are discarded.
Cheat Sheet
| Component | Function | Location |
|---|---|---|
| Commit Log | Durability for writes. Replayed on crash. | Disk (Sequential) |
| MemTable | In-memory write buffer. Sorted map. | RAM |
| SSTable | Immutable data file. Created when MemTable flushes. | Disk |
| Bloom Filter | Probabilistic check for key existence. | RAM (Off-heap) |
| Partition Summary | Sampling of the Partition Index. | RAM |
| Partition Index | Maps keys to disk offsets. | Disk |
| Gossip | Cluster state propagation protocol. | Network |
| Seed Node | Introduction point for new nodes. | Config |
| Snitch | Determines network topology (Rack/DC). | Config |
| Tombstone | Marker for deleted data. | SSTable |