Scaling & Operations

Scaling is the superpower of Kubernetes. It allows your applications to handle massive traffic spikes without manual intervention and scale down to zero to save money when users are sleeping.

In this module, we will deconstruct the “Scaling Triad” and the operational primitives that keep your cluster healthy.

1. The Scaling Triad

  1. Horizontal Pod Autoscaler (HPA): Scales out. Adds more replicas when load increases.
  2. Vertical Pod Autoscaler (VPA): Scales up. Increases CPU/Memory requests for “right-sizing”.
  3. Cluster Autoscaler (CA): Scales infrastructure. Adds nodes when pods have no place to go.

2. Module Roadmap

  1. Horizontal Pod Autoscaler: Understand the control loop, stabilization windows, and how to scale on custom metrics.
  2. Vertical Pod Autoscaler: Stop guessing resource limits. Let VPA find the optimal size for your pods.
  3. Cluster Autoscaler: Learn how the cluster itself expands and shrinks based on pending pods.
  4. Metrics Server: The engine under the hood. How metrics are scraped and served to the autoscalers.
  5. DaemonSets: Running system agents (logs, monitoring) on every node.
  6. Module Review: Test your knowledge with flashcards and a cheat sheet.

3. Key Learning Objectives

  • Design autoscaling strategies that prevent “thrashing” (rapid scale up/down).
  • Debug common scaling issues like “HPA flapping” or “CA getting stuck”.
  • Optimize cost by using VPA to reclaim wasted resources.
  • Deploy system-wide services effectively using DaemonSets.