Consensus: Agreeing on Truth

In the previous chapter, we elected a Leader. Now, that Leader has a job: Replicate Data.

In a distributed system, how do we get multiple nodes to agree on a sequence of values?

  • Problem: Nodes crash. Packets vanish. The network partitions.
  • Goal: All honest nodes agree on the same log of commands (Safety) and eventually decide (Liveness).

1. The Trinity: Paxos vs Raft vs ZAB

While Paxos is the academic holy grail, Raft is the engineer’s choice.

Feature Paxos (1989) Raft (2014) ZAB (ZooKeeper)
Philosophy Theoretical purity. No specific leader required. Understandability. Strong Leader is mandatory. Primary-Backup ordering (FIFO).
Complexity Extreme. (Multi-Paxos is hard to implement). Moderate. Decomposed into Election & Log Replication. High.
Use Case Google Spanner, Cassandra (LWT). Kubernetes (Etcd), Consul, CockroachDB. Kafka (Old Controller), Hadoop.

[!TIP] Interview Strategy: Explain Raft. It has a linear narrative: Leader ElectionLog ReplicationCommit. Paxos is a mesh of proposers and acceptors that is easy to get wrong.


2. Raft Log Replication

Once a Leader is elected, it handles all Client requests. The goal: Replicate SET X=5 to a majority of followers.

The Flow Diagram

Leader
Followers (Quorum)
Client
1. SET X=5
Log: [X=5] (Uncommitted)
2. AppendEntries RPC
Log: [X=5] (Uncommitted)
3. Success (ACK)
Log: [X=5] (COMMITTED)

3. Interactive Demo: The Replication Loop

Cyberpunk Mode: Visualize the commit flow.

  • Mission: Replicate SET X=99 to the cluster.
  • Obstacle: Network Partition.
  • Visuals: Watch the logs turn from Yellow (Uncommitted) to Green (Committed).

[!TIP] Try it yourself:

  1. Click “🚀 Send SET X=99”. Watch the Leader replicate to Followers, get ACKs, and Commit.
  2. Click “✂️ Network” to cut the connection.
  3. Try sending again. Watch the Leader retry endlessly because it can’t reach a Quorum.
System Idle. Waiting for Client.
LEADER
Empty
Commit Idx: 0
Follower 1
Empty
Commit Idx: 0
Follower 2
Empty
Commit Idx: 0

4. Crash Fault vs Byzantine Fault

Standard Consensus (Raft/Paxos/ZAB) assumes CFT (Crash Fault Tolerance).

  • Nodes might fail, pause, or lose disks.
  • But nodes do NOT lie.

The Byzantine Generals Problem

What if a node is compromised and sends:

  • “I vote for A” to General 1.
  • “I vote for B” to General 2.

This is a Byzantine Fault. Raft cannot handle this. You need BFT (Byzantine Fault Tolerance) algorithms like PBFT or Proof of Work (Bitcoin).

Type Tolerance Failure Example Algo
CFT (N-1)/2 Failures Power outage, GC Pause, Disk crash. Raft, Paxos
BFT (N-1)/3 Failures Hacked node, Software Bug, Malicious actor. PBFT, Tendermint

Summary

  • Leader orders the log.
  • Followers replicate the log.
  • Commit happens only when a Majority acknowledges.
  • Linearizability is guaranteed by reading from the Leader (who checks with a quorum).