
Every Junior Engineer knows Leader-Follower (Master-Slave) replication. A Staff Engineer understands Leaderless (Dynamo-style) replication, arguably the most important innovation in distributed databases of the last 20 years (Powering DynamoDB, Cassandra, Rippled).
1. Leader-Based Replication (Postgres/MySQL)
- Writes: Go to Leader.
- Reads: Go to Leader (Strong Consistency) or Follower (Eventual Consistency).
- Failure Mode: If Leader dies, a failover process (Paxos/Raft) must elect a new one. This causes Write Downtime (seconds to minutes).
2. Leaderless Replication (Dynamo/Cassandra)
- Writes: Client sends request to any replica. The node forwards it to all $N$ replicas.
- Reads: Client queries $N$ replicas in parallel.
- No Leader: All nodes are peers. If one dies, others keep accepting writes. Zero Write Downtime.
3. The Quorum Equation
How do we ensure we read the latest data without a leader?
\[R + W > N\]- $N$: Replication Factor (usually 3).
- $W$: Write Quorum (nodes that must confirm write).
- $R$: Read Quorum (nodes we must query).
If $R + W > N$, the Read set and Write set must overlap by at least one node. That node holds the latest data.
4. Interactive Quorum Calculator
Adjust $N$, $R$, and $W$ to see if your system is Consistent (Strong) or Available (Eventual).
5. Anti-Entropy: Fixing The Drift
In Leaderless systems, replicas will diverge. (e.g., A node was down during a write). How do we fix it?
A. Read Repair (Lazy)
When a client reads from $R$ nodes, the coordinator compares their versions.
- Node A:
v1 - Node B:
v2(Newer) - Node C:
v1Action: The coordinator returnsv2to the client and asynchronously writesv2to Nodes A and C.
B. Anti-Entropy / Merkle Trees (Proactive)
What if data is cold and never read? It stays stale forever. Solution: Background processes exchange Merkle Trees (Hash trees of data ranges).
- If Root Hash matches -> Data is identical.
- If mismatch -> Recurse down tree to find exact missing row.
- Efficiency: Only transfer the specific missing row, not the whole DB.
Staff Takeaway
Leaderless systems trade Latency (multiple requests per read) and Complexity (Read Repair) for High Availability (no single point of failure). Use them for “Always-On” write-heavy workloads (like User Carts, Event Logs).