Chubby (Distributed Lock Service)

[!TIP] The Referee: In distributed systems, nodes constantly disagree. “I’m the Master!” “No, I am!” Chubby is the arbiter. It uses Paxos to provide a consistent source of truth. It is the spiritual ancestor of ZooKeeper and etcd.

1. The Problem: Who is in Charge?

In GFS and BigTable, we have a Single Master. But what if that machine dies?

  • We need to elect a new Master.
  • We need to ensure the old Master knows it’s fired (so it doesn’t corrupt data).
  • Split Brain: The nightmare scenario where two nodes both think they are Master and both write to the disk.

2. Coarse-Grained Locking

Chubby is optimized for Coarse-Grained locks (held for hours/days), not Fine-Grained locks (seconds).

  • Why? Running Paxos for every database transaction is too slow.
  • Pattern: Clients use Chubby to elect a Leader (heavy operation, infrequent), and then the Leader manages the data (fast, frequent).

3. Architecture: The Cell

A Chubby Cell is a fault-tolerant cluster that uses Paxos to maintain a consistent filesystem state across replicas, as visualized in our architecture diagram.

  1. 5 Replicas: A typical cell has 5 nodes to ensure quorum (3 nodes) for Paxos consensus.
  2. Chubby Master: All client RPCs (reads and writes) go to the Master. The Master acts as the Paxos Proposer.
  3. Strict Consistency & Invalidation: To ensure all clients see the same data, the Master implements a Cache Invalidation flow (red arrows):
    • Write Request: A client sends a write.
    • Invalidate: Before committing, the Master sends invalidation signals to all clients caching that file.
    • Ack: The Master waits for all acknowledgements (gray dots) before allowing the write to proceed.
    • Trade-off: This makes writes slow but allows for instant, local reads for the majority of clients.
Distributed State: The Chubby Cell
5-Node Paxos Quorum | Strict Consistency | Cache Invalidation
MASTER Paxos Proposer
Replica Acceptor
Replica Acceptor
Replica Acceptor
Replica Acceptor
Client A
Local Cache
Client B
Local Cache
Paxos Quorum 1. Write "/lock" 2. INVALIDATE CACHE 3. Ack (Purged)

Interactive: Cache Invalidation Flow

Client A
Cached: "Foo"
Master
Value: "Foo"
Client B
System Stable.

4. Sessions and KeepAlives

Chubby uses Sessions and a KeepAlive mechanism (visualized as the heartbeat pulse) to manage client health.

  1. Handshake: Client connects, gets a Session ID.
  2. KeepAlive: Client sends a heartbeat every few seconds.
  3. Jeopardy: If the Master doesn’t reply (maybe Master crashed), the client enters “Jeopardy”. It waits for a Grace Period (e.g., 45s).
    • If a new Master is elected within 45s, the session is saved.
    • If not, the session (and all locks) are lost.

5. Preventing Split Brain: Sequencers

The most dangerous problem in distributed systems is Split Brain.

  • The Scenario: Client A acquires the lock. It freezes (Stop-the-World GC) for 1 minute.
  • The Result: Chubby thinks A is dead. It gives the lock to Client B.
  • The Conflict: Client A wakes up, thinking it still owns the lock, and tries to write to the database (GFS). If GFS accepts it, data is corrupted.

The Solution: Fencing Tokens (Sequencers)

  1. Chubby attaches a Monotonic Version Number to every lock (e.g., Token 10).
  2. When Client B takes the lock, the version increments (Token 11).
  3. When Client A (Zombie) tries to write with Token 10, the storage system (GFS) checks: if (10 < 11) REJECT.

Interactive Demo: Fencing the Zombie

Visualize a Zombie Client trying to write with an old token.

  • Storage: Tracks the Highest Token seen (Current: 10).
  • Client B: Active Leader (Token 11).
  • Client A: Zombie Leader (Token 10).
Storage (GFS)
Max Token Seen:
10
Zombie (Client A)
Token: 10
Active (Client B)
Token: 11
System Ready.

6. Consensus Deep Dive: Paxos vs Raft

Chubby (and Google) famously uses Paxos, while modern open-source systems like etcd use Raft.

Why Raft?

Paxos is notoriously difficult to understand and implement correctly. Raft was designed specifically to be Understandable.

  • Leader Election: Raft has a strong leader concept built-in. Paxos can run leaderless (but usually doesn’t).
  • Log Replication: Raft enforces a stricter log consistency model (Leader appends only).

Comparison Table

| Feature | Chubby (Paxos) | ZooKeeper (ZAB) | etcd (Raft) | |:——–|:——–|:——–|:——–| | Algorithm | Paxos (Complex) | ZAB (Atomic Broadcast) | Raft (Understandable) | | Abstraction | File System (Files) | Directory (ZNodes) | Key-Value Store | | Caching | Heavy (Client Invalidation) | None (Read from Replicas) | None (Read from Leader/Follower) | | Consistency | Strict (Linearizable) | Sequential (Can be stale) | Strict (Linearizable) |


7. Observability & Tracing

Chubby is critical infrastructure. If it goes down, Google goes down.

RED Method

  1. Rate: KeepAlive RPCs/sec.
  2. Errors: Session Lost count. This is a critical alert.
  3. Duration: Paxos Round Latency (Commit Time). High latency means disk issues on the Master.

Health Checks

  • Quorum Size: Alert if < 3 replicas are alive.
  • Snapshot Age: Alert if DB snapshot is older than 1 hour.

8. Deployment Strategy

Updating the Lock Service is scary. We use Cell Migration.

  1. New Binary: Deploy to 1 replica.
  2. Wait: Ensure it rejoins quorum and catches up.
  3. Repeat: Deploy to remaining replicas one by one.
  4. Master Failover: Force a master election to ensure the new binary can lead.

9. Requirements Traceability Matrix

Requirement Architectural Solution
Consistency Paxos Consensus Algorithm.
Availability 5-Node Cell (survives 2 failures) + Master Failover.
Scalability Heavy Client-Side Caching (Invalidation).
Correctness Fencing Tokens to prevent Split Brain.

10. Interview Gauntlet

I. Consensus

  • Why Paxos? To agree on values (lock ownership) even if nodes fail.
  • Why 5 nodes? To survive 2 failures. 3 nodes only survive 1.
  • What is a Proposer? The node that suggests a value. In Chubby, only the Master proposes.

II. Caching

  • Why Invalidation? To ensure Strict Consistency. A client cannot read stale data.
  • What if a client misses an Invalidation? The Master waits for Acks. If a client doesn’t ack, the Master drops its session.

III. System Design

  • Can I use Chubby for a high-speed queue? No. Writes are slow (Paxos). Use Kafka.
  • What is “Jeopardy”? A state where the client has lost contact with the Master but the Session is still valid (Grace Period).

11. Summary: The Whiteboard Strategy

If asked to design ZooKeeper/Chubby, draw this 4-Quadrant Layout:

1. Requirements

  • Func: Lock, Elect Leader, Store Config.
  • Non-Func: CP (Consistency), High Reliability.
  • Scale: Small Data, Read Heavy.

2. Architecture

[Client (Cache)]
|
[Master (Proposer)]
/ | \
[Replica] [Replica] [Replica]

* Consensus: Paxos/Raft.
* Session: KeepAlives.

3. Metadata

Node: /ls/cell/foo
Content: "10.0.0.1:8080"
Stat: {version: 10, ephemeral: true}

4. Reliability

  • Split Brain: Fencing Tokens (Sequencers).
  • Performance: Client Caching.
  • Availability: Session Grace Periods.

This concludes the Deep Dive into Infrastructure. Now, let’s learn how to operate these monsters in Module 17: Ops Excellence.