Chubby (Distributed Lock Service)

[!TIP] The Referee: In distributed systems, nodes constantly disagree. “I’m the Master!” “No, I am!” Chubby is the arbiter. It uses Paxos to provide a consistent source of truth. It is the spiritual ancestor of ZooKeeper and etcd.

1. The Problem: Who is in Charge?

In GFS and BigTable, we have a Single Master. But what if that machine dies?

  • We need to elect a new Master.
  • We need to ensure the old Master knows it’s fired (so it doesn’t corrupt data).
  • Split Brain: The nightmare scenario where two nodes both think they are Master and both write to the disk.

2. Coarse-Grained Locking

Chubby is optimized for Coarse-Grained locks (held for hours/days), not Fine-Grained locks (seconds).

  • Why? Running Paxos for every database transaction is too slow.
  • Pattern: Clients use Chubby to elect a Leader (heavy operation, infrequent), and then the Leader manages the data (fast, frequent).

3. Architecture: The Cell

A Chubby Cell is a fault-tolerant cluster that uses Paxos to maintain a consistent filesystem state across replicas, as visualized in our architecture diagram.

  1. 5 Replicas: A typical cell has 5 nodes to ensure quorum (3 nodes) for Paxos consensus.
  2. Chubby Master: All client RPCs (reads and writes) go to the Master. The Master acts as the Paxos Proposer.
  3. Strict Consistency & Invalidation: To ensure all clients see the same data, the Master implements a Cache Invalidation flow (red arrows):
    • Write Request: A client sends a write.
    • Invalidate: Before committing, the Master sends invalidation signals to all clients caching that file.
    • Ack: The Master waits for all acknowledgements (gray dots) before allowing the write to proceed.
    • Trade-off: This makes writes slow but allows for instant, local reads for the majority of clients.
Distributed State: The Chubby Cell
5-Node Paxos Quorum | Strict Consistency | Cache Invalidation
MASTER Paxos Proposer
Replica Acceptor
Replica Acceptor
Replica Acceptor
Replica Acceptor
Client A
Local Cache
Client B
Local Cache
Paxos Quorum 1. Write "/lock" 2. INVALIDATE CACHE 3. Ack (Purged)

Interactive: Cache Invalidation Flow

Client A
Cached: "Foo"
Master
Value: "Foo"
Client B
System Stable.

4. Sessions and KeepAlives

Chubby uses Sessions and a KeepAlive mechanism (visualized as the heartbeat pulse) to manage client health.

  1. Handshake: Client connects, gets a Session ID.
  2. KeepAlive: Client sends a heartbeat every few seconds.
  3. Jeopardy: If the Master doesn’t reply (maybe Master crashed), the client enters “Jeopardy”. It waits for a Grace Period (e.g., 45s).
    • If a new Master is elected within 45s, the session is saved.
    • If not, the session (and all locks) are lost.

5. Preventing Split Brain: Sequencers

The most dangerous problem in distributed systems is Split Brain.

  • The Scenario: Client A acquires the lock. It freezes (Stop-the-World GC) for 1 minute.
  • The Result: Chubby thinks A is dead. It gives the lock to Client B.
  • The Conflict: Client A wakes up, thinking it still owns the lock, and tries to write to the database (GFS). If GFS accepts it, data is corrupted.

The Solution: Fencing Tokens (Sequencers)

  1. Chubby attaches a Monotonic Version Number to every lock (e.g., Token 10).
  2. When Client B takes the lock, the version increments (Token 11).
  3. When Client A (Zombie) tries to write with Token 10, the storage system (GFS) checks: if (10 < 11) REJECT.

Interactive Demo: Fencing the Zombie

Visualize a Zombie Client trying to write with an old token.

  • Storage: Tracks the Highest Token seen (Current: 10).
  • Client B: Active Leader (Token 11).
  • Client A: Zombie Leader (Token 10).
Storage (GFS)
Max Token Seen:
10
Zombie (Client A)
Token: 10
Active (Client B)
Token: 11
System Ready.

6. Consensus Deep Dive: Paxos vs Raft

Chubby (and Google) famously uses Paxos, while modern open-source systems like etcd use Raft.

Why Raft?

Paxos is notoriously difficult to understand and implement correctly. Raft was designed specifically to be Understandable.

  • Leader Election: Raft has a strong leader concept built-in. Paxos can run leaderless (but usually doesn’t).
  • Log Replication: Raft enforces a stricter log consistency model (Leader appends only).

Comparison Table

Feature Chubby (Paxos) ZooKeeper (ZAB) etcd (Raft)
Algorithm Paxos (Complex) ZAB (Atomic Broadcast) Raft (Understandable)
Abstraction File System (Files) Directory (ZNodes) Key-Value Store
Caching Heavy (Client Invalidation) None (Read from Replicas) None (Read from Leader/Follower)
Consistency Strict (Linearizable) Sequential (Can be stale) Strict (Linearizable)

7. Observability & Tracing

Chubby is critical infrastructure. If it goes down, Google goes down.

RED Method

  1. Rate: KeepAlive RPCs/sec.
  2. Errors: Session Lost count. This is a critical alert.
  3. Duration: Paxos Round Latency (Commit Time). High latency means disk issues on the Master.

Health Checks

  • Quorum Size: Alert if < 3 replicas are alive.
  • Snapshot Age: Alert if DB snapshot is older than 1 hour.

8. Deployment Strategy

Updating the Lock Service is scary. We use Cell Migration.

  1. New Binary: Deploy to 1 replica.
  2. Wait: Ensure it rejoins quorum and catches up.
  3. Repeat: Deploy to remaining replicas one by one.
  4. Master Failover: Force a master election to ensure the new binary can lead.

9. Requirements Traceability Matrix

Requirement Architectural Solution
Consistency Paxos Consensus Algorithm.
Availability 5-Node Cell (survives 2 failures) + Master Failover.
Scalability Heavy Client-Side Caching (Invalidation).
Correctness Fencing Tokens to prevent Split Brain.

10. Interview Gauntlet

I. Consensus

  • Why Paxos? To agree on values (lock ownership) even if nodes fail.
  • Why 5 nodes? To survive 2 failures. 3 nodes only survive 1.
  • What is a Proposer? The node that suggests a value. In Chubby, only the Master proposes.

II. Caching

  • Why Invalidation? To ensure Strict Consistency. A client cannot read stale data.
  • What if a client misses an Invalidation? The Master waits for Acks. If a client doesn’t ack, the Master drops its session.

III. System Design

  • Can I use Chubby for a high-speed queue? No. Writes are slow (Paxos). Use Kafka.
  • What is “Jeopardy”? A state where the client has lost contact with the Master but the Session is still valid (Grace Period).

11. Summary: The Whiteboard Strategy

If asked to design ZooKeeper/Chubby, draw this 4-Quadrant Layout:

1. Requirements

  • Func: Lock, Elect Leader, Store Config.
  • Non-Func: CP (Consistency), High Reliability.
  • Scale: Small Data, Read Heavy.

2. Architecture

[Client (Cache)] | [Master (Proposer)] / | \ [Replica] [Replica] [Replica]

* Consensus: Paxos/Raft. * Session: KeepAlives.

3. Metadata

Node: /ls/cell/foo Content: "10.0.0.1:8080" Stat: {version: 10, ephemeral: true}

4. Reliability

  • Split Brain: Fencing Tokens (Sequencers).
  • Performance: Client Caching.
  • Availability: Session Grace Periods.

This concludes the Deep Dive into Infrastructure. Now, let’s learn how to operate these monsters in Module 17: Ops Excellence.