Etcd: The Single Source of Truth

[!WARNING] Backup Etcd! If you lose your Etcd data, you lose your entire cluster. The API Server is stateless. The Scheduler is stateless. Only Etcd holds the state.

1. First Principles: Why a Distributed Store?

To understand Etcd, we must understand the hardware reality of a distributed system. Imagine having 1,000 servers (Worker Nodes). When an administrator tells the Control Plane “deploy 5 copies of this web server”, that command cannot be stored on just one hard drive.

Think of Etcd as the central ledger in a massive warehouse. If the manager (Control Plane) writes down an order on a single piece of paper, a coffee spill could ruin the entire operation. Instead, the manager writes the order on a digital tablet that instantly mirrors the order to multiple secure terminals across the warehouse.

If one terminal (hard drive) fails, the warehouse (cluster) doesn’t forget its mission. The data is already safely replicated elsewhere.

Etcd is a distributed, consistent key-value store.

Distributed: It runs on multiple servers (nodes) for high availability, meaning data is mirrored across physical disks.
Consistent: It prioritizes Consistency over Availability (a CP system in the CAP theorem). If you write data to Etcd and get a success response, any subsequent read will guarantee that data. This is achieved via a mathematical consensus protocol called Raft.
Key-Value: Data is stored as simple directory-like keys (e.g., /registry/pods/default/mypod) and values (serialized as Protobuf for speed and compactness).

2. Hardware Reality: Why not MySQL or Postgres?

Why doesn’t Kubernetes use a traditional relational database? The answer lies in the specific constraints of the system:

The Watch Mechanism (Push vs Pull): Relational databases are built for polling (pulling data). Kubernetes controllers (like the ReplicaSet Controller) must react instantly to state changes. If 50 controllers pulled data every second, the database CPU would throttle. Etcd uses long-polling HTTP/gRPC streams to push changes to watchers instantly, minimizing CPU cycles and network overhead.
Concurrency Control: In a large cluster, hundreds of components might try to update a Pod’s status simultaneously. Traditional databases use heavy row locks. Etcd uses MVCC (Multi-Version Concurrency Control) and Optimistic Locking via resourceVersion. This means Etcd never locks; it simply rejects updates if the resourceVersion doesn’t match the latest sequence number, forcing the client to retry. This is significantly faster for read-heavy, high-contention workloads.
Data Structure: Kubernetes configuration objects are JSON/YAML documents containing deep hierarchies. Flattening these into rigid tables (rows/columns) requires expensive JOINs. A Key-Value store mapped directly to a file system hierarchy (like /registry/pods/namespace/pod-name) provides O(1) lookups for any specific resource.

3. How Etcd Works: The Raft Consensus Algorithm

In a distributed system, how do you ensure all nodes agree on the data? Etcd uses Raft.

The Problem: Split Brain

Imagine you have 3 Etcd nodes. If the network splits, you don’t want two different leaders accepting conflicting writes. This is like a company with two co-CEOs who can’t communicate with each other, both signing different contracts for the same budget.

The Solution: Quorum

Raft requires a Quorum (majority) to commit a write.

3 Nodes: Need 2 to agree. Can survive 1 failure.
5 Nodes: Need 3 to agree. Can survive 2 failures.
Even Numbers?: Bad idea. With 4 nodes, if the network splits 2 vs 2, neither side has a majority (3). The cluster stops accepting writes.

Leader Election

One node is elected Leader. All writes MUST go to the Leader. The Leader replicates the data to the Followers. Once a majority acknowledge the write, the Leader commits it.

4. Interactive: Raft Consensus Visualizer

Simulate a 3-node Etcd cluster. See how writes are replicated.

LEADER

Log: []

FOLLOWER

Log: []

FOLLOWER

Log: []

Cluster healthy. Leader is Node 1.

5. Summary

Etcd is the only stateful part of the Control Plane. It uses the Raft algorithm to ensure that your cluster configuration is safe, consistent, and available even if a machine fails.

In the next chapter, we explore the Declarative Model, which relies entirely on Etcd’s ability to store the “Desired State”.

Etcd: The Brain of Kubernetes

Etcd: The Single Source of Truth

1. First Principles: Why a Distributed Store?

2. Hardware Reality: Why not MySQL or Postgres?

3. How Etcd Works: The Raft Consensus Algorithm

The Problem: Split Brain

The Solution: Quorum

Leader Election

4. Interactive: Raft Consensus Visualizer

5. Summary

Found this lesson helpful?