Module Review: Architecture

Congratulations on completing the Cassandra Architecture module! You’ve dove deep into the internals of how Cassandra achieves massive scalability and high availability.

Key Takeaways

  • Write Path: Writes are appended to the Commit Log (durability) and MemTable (speed). MemTables flush to immutable SSTables on disk.
  • LSM Trees: Log-Structured Merge-Trees optimize for high write throughput by avoiding random disk I/O updates.
  • Gossip Protocol: Nodes communicate peer-to-peer to propagate cluster state. Phi Accrual Failure Detector adaptively detects failures.
  • Read Path: Reads check MemTable, Row Cache, Bloom Filters, Partition Key Cache, and finally SSTables.
  • Compaction: Background process that merges SSTables to reclaim space and remove deleted data (tombstones).

Interactive Flashcards

Test your knowledge with these flashcards. Click to flip!

What is the purpose of the Commit Log?

Click to reveal

Durability

It ensures data is safe on disk in case of a node crash before the MemTable is flushed.

Why are SSTables immutable?

Click to reveal

Performance & Concurrency

Immutability enables sequential writes (fast) and simplifies read/write concurrency without complex locking.

What does a Bloom Filter do?

Click to reveal

Avoid Disk Seeks

It probabilistically checks if a partition key *might* exist in an SSTable. False positives are possible; false negatives are not.

What is Phi Accrual?

Click to reveal

Failure Detection

An adaptive failure detection algorithm that calculates the suspicion level (φ) of a node failure based on heartbeat history.

What happens during Compaction?

Click to reveal

Merge & Purge

Multiple SSTables are merged into one. Tombstones are processed, deleted data is removed, and old versions of rows are discarded.

Cheat Sheet

Component Function Location
Commit Log Durability for writes. Replayed on crash. Disk (Sequential)
MemTable In-memory write buffer. Sorted map. RAM
SSTable Immutable data file. Created when MemTable flushes. Disk
Bloom Filter Probabilistic check for key existence. RAM (Off-heap)
Partition Summary Sampling of the Partition Index. RAM
Partition Index Maps keys to disk offsets. Disk
Gossip Cluster state propagation protocol. Network
Seed Node Introduction point for new nodes. Config
Snitch Determines network topology (Rack/DC). Config
Tombstone Marker for deleted data. SSTable