Vertical Pod Autoscaler (VPA)

Developers are terrible at guessing resource requirements.

  • Guess Too High? You waste money (Slack resources). Your cluster runs at 10% utilization.
  • Guess Too Low? Your application crashes with OOMKilled (Out Of Memory) or gets CPU throttled.

Vertical Pod Autoscaler (VPA) solves this by automatically adjusting the requests and limits based on historical usage.

1. How VPA Works: The Components

VPA is not built-in to the core controller manager like HPA. It runs as a separate deployment (often installed via Helm).

  1. Recommender: Watches historical usage (from Metrics Server or Prometheus) and calculates the “ideal” request.
  2. Updater: Checks if running pods have requests that deviate significantly from the recommendation. If so, it evicts them.
  3. Admission Controller: When the pod is recreated (by the Deployment controller), the Admission Controller injects the new, correct resource requests.

2. Interactive: The “Right-Sizing” visualizer

Observe how VPA adjusts the Request Line (Red Dashed) to match the Actual Usage (Blue Wave).

Actual Usage
Request (Limit)

3. VPA Modes: From Safe to Disruptive

You can configure how aggressive VPA is via updateMode.

1. Off (Dry Run)

VPA calculates recommendations but applies nothing.

  • Use Case: Safe exploration. View recommendations via kubectl describe vpa.

2. Initial

VPA only applies changes when a pod is created. It will never evict a running pod.

  • Use Case: Stateful applications where restarts are expensive.

3. Auto (The Default)

VPA will evict running pods if their requests are significantly wrong.

  • Use Case: Stateless web servers.
  • Warning: This causes restarts! Ensure you have PodDisruptionBudgets configured.

4. Recreate

Rarely used. Similar to Auto but respects PDBs less strictly.

4. The Conflict: VPA vs HPA

[!CAUTION] The Feedback Loop of Death Do NOT use HPA and VPA on the same metric (CPU/Memory).

Imagine HPA targets 50% CPU, and VPA targets “Right Sizing”.

  1. Traffic Spikes: CPU goes to 80%.
  2. HPA Reacts: “CPU is high! Add replicas!” → Adds 2 pods.
  3. VPA Reacts: “CPU is high! Increase request size!” → Increases request from 100m to 200m.
  4. Result: The new pods are now LARGER, consuming even more cluster capacity. The HPA sees utilization drop (because VPA increased the denominator requests), so it might scale down, then scale up again.

Solution:

  • Use HPA for CPU/Memory scaling.
  • Use VPA only in Off mode for recommendations, OR
  • Use VPA only for memory (if HPA is on CPU).

5. Implementation

VPA Manifest

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
spec:
  targetRef:
  apiVersion: "apps/v1"
  kind:       Deployment
  name:       nginx
  updatePolicy:
  updateMode: "Auto"
  resourcePolicy:
  containerPolicies:
  - containerName: '*'
    minAllowed:
    cpu: 100m
    memory: 50Mi
    maxAllowed:
    cpu: 1
    memory: 500Mi

Checking Recommendations

Once VPA is running, check its status:

kubectl describe vpa nginx-vpa

Output:

Recommendation:
  Container Recommendations:
  Container Name:  nginx
  Lower Bound:     CPU: 25m, Memory: 262144k
  Target:          CPU: 25m, Memory: 262144k
  Uncapped Target: CPU: 25m, Memory: 262144k
  Upper Bound:     CPU: 1,   Memory: 500Mi
  • Lower Bound: Minimum resources needed.
  • Target: The recommended value.
  • Upper Bound: Maximum resources VPA will ever set.