When simple replication and sharding aren’t enough to handle global consistency, we move into the domain of Advanced Distributed Patterns.

This module covers the “final boss” topics of system design: how to reach agreement in a cluster with failing nodes, how to order events across the globe without a perfectly sync’d clock, and how to manage complex multi-step transactions across independent services.

Chapters

1. Consensus: Raft & Paxos

  • Reaching Agreement: How etcd and CockroachDB stay consistent.
  • Operational Nuances: Raft Pre-Vote and Joint Consensus for membership changes.
  • Interactive: Raft Pre-Vote Election Simulator.

2. Distributed Time: HLC vs TrueTime

  • The Order Problem: Why NTP isn’t enough for serializability.
  • Modern Solutions: Hybrid Logical Clocks (HLC) and Google’s TrueTime (GPS + Atomic).
  • Interactive: HLC Uncertainty Wait Simulator.

3. Distributed Transactions: TCC vs Sagas

  • Atomicity Across Services: When to use Sagas vs TCC (Try-Confirm-Cancel).
  • Trade-offs: Isolation levels vs operational complexity.
  • Interactive: TCC Booking Flow vs Saga Rollback Comparison.

Module Chapters

Chapter 1

Consensus: Raft & Paxos

Deep dive into Raft and Paxos. Understanding why 'Pre-Vote' is essential for production etcd clusters and how Joint Consensus handles membership changes.

Start Learning →