Chubby (Distributed Lock Service)
[!TIP] The Referee: In distributed systems, nodes constantly disagree. “I’m the Master!” “No, I am!” Chubby is the arbiter. It uses Paxos to provide a consistent source of truth. It is the spiritual ancestor of ZooKeeper and etcd.
1. The Problem: Who is in Charge?
In GFS and BigTable, we have a Single Master. But what if that machine dies?
- We need to elect a new Master.
- We need to ensure the old Master knows it’s fired (so it doesn’t corrupt data).
- Split Brain: The nightmare scenario where two nodes both think they are Master and both write to the disk.
2. Coarse-Grained Locking
Chubby is optimized for Coarse-Grained locks (held for hours/days), not Fine-Grained locks (seconds).
- Why? Running Paxos for every database transaction is too slow.
- Pattern: Clients use Chubby to elect a Leader (heavy operation, infrequent), and then the Leader manages the data (fast, frequent).
3. Architecture: The Cell
A Chubby Cell is a fault-tolerant cluster that uses Paxos to maintain a consistent filesystem state across replicas, as visualized in our architecture diagram.
- 5 Replicas: A typical cell has 5 nodes to ensure quorum (3 nodes) for Paxos consensus.
- Chubby Master: All client RPCs (reads and writes) go to the Master. The Master acts as the Paxos Proposer.
- Strict Consistency & Invalidation: To ensure all clients see the same data, the Master implements a Cache Invalidation flow (red arrows):
- Write Request: A client sends a write.
- Invalidate: Before committing, the Master sends invalidation signals to all clients caching that file.
- Ack: The Master waits for all acknowledgements (gray dots) before allowing the write to proceed.
- Trade-off: This makes writes slow but allows for instant, local reads for the majority of clients.
</div> </div>
Interactive: Cache Invalidation Flow
4. Sessions and KeepAlives
Chubby uses Sessions and a KeepAlive mechanism (visualized as the heartbeat pulse) to manage client health.
- Handshake: Client connects, gets a Session ID.
- KeepAlive: Client sends a heartbeat every few seconds.
- Jeopardy: If the Master doesn’t reply (maybe Master crashed), the client enters “Jeopardy”. It waits for a Grace Period (e.g., 45s).
- If a new Master is elected within 45s, the session is saved.
- If not, the session (and all locks) are lost.
5. Preventing Split Brain: Sequencers
The most dangerous problem in distributed systems is Split Brain.
- The Scenario: Client A acquires the lock. It freezes (Stop-the-World GC) for 1 minute.
- The Result: Chubby thinks A is dead. It gives the lock to Client B.
- The Conflict: Client A wakes up, thinking it still owns the lock, and tries to write to the database (GFS). If GFS accepts it, data is corrupted.
The Solution: Fencing Tokens (Sequencers)
- Chubby attaches a Monotonic Version Number to every lock (e.g., Token 10).
- When Client B takes the lock, the version increments (Token 11).
- When Client A (Zombie) tries to write with Token 10, the storage system (GFS) checks:
if (10 < 11) REJECT.
Interactive Demo: Fencing the Zombie
Visualize a Zombie Client trying to write with an old token.
- Storage: Tracks the Highest Token seen (Current: 10).
- Client B: Active Leader (Token 11).
- Client A: Zombie Leader (Token 10).
[!NOTE] War Story: The 30-Minute GC Pause At a major streaming company, a storage leader node experienced an unexpected 30-minute Garbage Collection pause. The rest of the cluster assumed it was dead and promptly elected a new leader. When the old leader finally woke up, it still believed it was in charge and attempted to flush its buffered writes to the persistent disk, which would have overwritten the new leader’s data. Because they had implemented Fencing Tokens (Sequencers) via their lock service, the storage layer recognized the old token and rejected the write, preventing a catastrophic Split Brain scenario.
6. Consensus Deep Dive: Paxos vs Raft
Chubby (and Google) famously uses Paxos, while modern open-source systems like etcd use Raft.
Why Raft?
Paxos is notoriously difficult to understand and implement correctly. Raft was designed specifically to be Understandable.
- Leader Election: Raft has a strong leader concept built-in. Paxos can run leaderless (but usually doesn’t).
- Log Replication: Raft enforces a stricter log consistency model (Leader appends only).
Comparison Table
| Feature | Chubby (Paxos) | ZooKeeper (ZAB) | etcd (Raft) |
|---|---|---|---|
| Algorithm | Paxos (Complex) | ZAB (Atomic Broadcast) | Raft (Understandable) |
| Abstraction | File System (Files) | Directory (ZNodes) | Key-Value Store |
| Caching | Heavy (Client Invalidation) | None (Read from Replicas) | None (Read from Leader/Follower) |
| Consistency | Strict (Linearizable) | Sequential (Can be stale) | Strict (Linearizable) |
7. Observability & Tracing
Chubby is critical infrastructure. If it goes down, Google goes down.
RED Method
- Rate: KeepAlive RPCs/sec.
- Errors: Session Lost count. This is a critical alert.
- Duration: Paxos Round Latency (Commit Time). High latency means disk issues on the Master.
Health Checks
- Quorum Size: Alert if < 3 replicas are alive.
- Snapshot Age: Alert if DB snapshot is older than 1 hour.
8. Deployment Strategy
Updating the Lock Service is scary. We use Cell Migration.
- New Binary: Deploy to 1 replica.
- Wait: Ensure it rejoins quorum and catches up.
- Repeat: Deploy to remaining replicas one by one.
- Master Failover: Force a master election to ensure the new binary can lead.
9. Requirements Traceability Matrix
| Requirement | Architectural Solution |
|---|---|
| Consistency | Paxos Consensus Algorithm. |
| Availability | 5-Node Cell (survives 2 failures) + Master Failover. |
| Scalability | Heavy Client-Side Caching (Invalidation). |
| Correctness | Fencing Tokens to prevent Split Brain. |
10. Interview Gauntlet
I. Consensus
- Why Paxos? To agree on values (lock ownership) even if nodes fail.
- Why 5 nodes? To survive 2 failures. 3 nodes only survive 1.
- What is a Proposer? The node that suggests a value. In Chubby, only the Master proposes.
II. Caching
- Why Invalidation? To ensure Strict Consistency. A client cannot read stale data.
- What if a client misses an Invalidation? The Master waits for Acks. If a client doesn’t ack, the Master drops its session.
III. System Design
- Can I use Chubby for a high-speed queue? No. Writes are slow (Paxos). Use Kafka.
- What is “Jeopardy”? A state where the client has lost contact with the Master but the Session is still valid (Grace Period).
11. Summary: The Whiteboard Strategy
If asked to design ZooKeeper/Chubby, draw this 4-Quadrant Layout:
1. Requirements
- Func: Lock, Elect Leader, Store Config.
- Non-Func: CP (Consistency), High Reliability.
- Scale: Small Data, Read Heavy.
2. Architecture
* Consensus: Paxos/Raft. * Session: KeepAlives.
3. Metadata
4. Reliability
- Split Brain: Fencing Tokens (Sequencers).
- Performance: Client Caching.
- Availability: Session Grace Periods.
This concludes the Deep Dive into Infrastructure. Now, let’s learn how to operate these monsters in Module 17: Ops Excellence.