Design Kubernetes (Control Plane Architecture)

[!NOTE] This module explores the core principles of Design Kubernetes (Control Plane Architecture), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Problem Statement

Managing a few Docker containers is easy. Managing 10,000 containers across 500 physical servers is a nightmare. You need a system that can:

  1. Deploy: Decide which server has enough CPU/RAM for a new container.
  2. Monitor: If a server dies, automatically restart its containers on a healthy server.
  3. Scale: Add or remove containers based on traffic.

Kubernetes (K8s) is the industry-standard solution for this. It is a Declarative System: you tell it the “Desired State” (e.g., “I want 3 copies of App X”), and K8s works tirelessly to make the “Actual State” match it.


2. Requirements & Goals

Functional Requirements

  1. Scheduling: Assign containers (Pods) to nodes based on resource availability.
  2. Healing: Detect container failures and replace them.
  3. Service Discovery: Provide a stable IP/DNS for a set of dynamic pods.
  4. Security: Authenticate and authorize every action.

Non-Functional Requirements

  1. Scalability: Manage thousands of nodes.
  2. High Availability: The Control Plane must be redundant (if the “Brain” dies, the “Body” keeps running).
  3. Extensibility: Allow users to add custom resource types (CRDs).

3. High-Level Architecture (Control Plane vs. Data Plane)

Kubernetes follows a Master-Worker architecture.

Control Plane (Master Nodes)
API Server
etcd (State)
Scheduler
Controller Mgr
Worker Node 1
[Kubelet] [Kube-Proxy] [Pod A | Pod B]
Worker Node 2
[Kubelet] [Kube-Proxy] [Pod C | Pod D]

4. The Heart: etcd & The API Server

  1. etcd: Every single piece of information—the number of pods, the server health, the passwords—is stored in etcd. It is a distributed key-value store that uses the Raft Consensus Algorithm to ensure all master nodes have the exact same view of the world.
  2. API Server: This is the gateway to the “Brain.” No person or component (even the Scheduler) can talk directly to etcd. They must go through the API Server, which handles authentication and validation.

5. The Brain: The Controller Pattern

The Controller Manager is a loop that never stops. It is the “Real-Time Enforcer” of Kubernetes.

  1. Watch: It asks the API Server: “What is the desired state for App X?” (Result: 3 Pods)
  2. Compare: It checks the actual state: “How many Pods are running?” (Result: 2 Pods)
  3. Act: It sends a command to create 1 more Pod to close the gap.

[!TIP] Analogy: The Thermostat You set the desired temperature to 72°F. The thermostat (Controller) senses it’s 68°F (Actual State) and turns on the heater (Action). When it reaches 72°F, it turns it off. K8s does this for thousands of distributed systems.


6. The Muscle: The Kubelet

Every worker node has a tiny agent called the Kubelet.

  • It receives commands from the Control Plane.
  • It talks to the Container Runtime (e.g., Docker/containerd) to actually start the Pods.
  • It reports the node’s “Vitals” (CPU/RAM/Health) back to the API Server so the Scheduler knows where to place future Pods.

7. Interview Gauntlet

  1. What happens if etcd goes down?
    • Ans: The “Brain” is paralyzed. Existing pods will keep running, but you cannot deploy new ones, delete old ones, or recover from node failures. This is why etcd is always deployed in highly available clusters (odd number of nodes, e.g., 3 or 5).
  2. How does the Scheduler choose a node?
    • Ans: It uses a two-step process: Filtering (can this node fit the pod?) and Scoring (which filtered node is the best fit based on resource utilization and network locality?).
  3. Why use an API Server instead of a direct DB connection?
    • Ans: Security and Consistency. The API Server ensures only valid, authorized changes are made to the state, and it prevents multiple components from conflicting with each other’s updates.

8. Summary

  • State: Managed by etcd (Consensus-based).
  • Logic: Driven by the Controller Loop (Watch → Compare → Act).
  • Communication: Centralized through the API Server.
  • Execution: Handled by the Kubelet on each worker node.