Review & Cheat Sheet
Key Takeaways
- Tunable Consistency: Cassandra lets you choose between Strong Consistency (
R + W > N) and High Availability per request. - CAP Theorem: Cassandra is an AP system (Availability + Partition Tolerance) by default, but can be configured to behave like CP.
- Hinted Handoff: A temporary failure handling mechanism where the coordinator stores writes for down nodes. It ensures eventual consistency but is not a replacement for repair.
- Anti-Entropy Repair: The process of synchronizing data between replicas.
- Read Repair: Lazy, fixes data on read access.
- Nodetool Repair: Active, background process using Merkle Trees.
- Merkle Trees: Hash trees used to efficiently compare massive datasets without transferring all data.
- Zombie Data: Data that reappears because a node missed a tombstone (deletion marker). Prevented by running repair within
gc_grace_seconds.
Flashcards
What is the formula for Strong Consistency in Cassandra?
R + W > N
(Read Nodes + Write Nodes > Replication Factor)
What is Hinted Handoff?
A mechanism where the coordinator temporarily stores a write for a down node and replays it when the node comes back online.
What data structure makes Anti-Entropy Repair efficient?
Merkle Tree
(Allows comparing large datasets by hashing blocks)
Which Consistency Level forces a majority of replicas to acknowledge?
QUORUM (or LOCAL_QUORUM)
True or False: Hinted Handoff can store hints forever.
False.
Hints expire (default 3 hours). After that, manual repair is needed.
What is a Tombstone?
A marker indicating that a row or cell has been deleted. It prevents deleted data from resurrecting (Zombie Data).
Cheat Sheet: Consistency Levels
| Level | Read Behavior | Write Behavior | Best For |
|---|---|---|---|
ONE |
Returns data from closest replica. | Acks after 1 replica writes. | Analytics, Logs, “Likes” |
QUORUM |
Returns data from majority (N/2+1). | Acks after majority writes. | General Purpose, Strong Consistency |
ALL |
Waits for all replicas. | Waits for all replicas. | Avoid. Zero fault tolerance. |
LOCAL_QUORUM |
Majority in local DC. | Majority in local DC. | Multi-region apps (Low latency) |
EACH_QUORUM |
N/A (Not supported for reads). | Majority in each DC. | Global Consistency (Very slow) |
ANY |
N/A (Not supported for reads). | Acks if even 1 hint is stored. | Dangerous. Data loss if coordinator dies. |
Next Steps
Now that you understand how Cassandra keeps data consistent, let’s look at how it handles massive scale.