Vertical Pod Autoscaler (VPA)
Developers are terrible at guessing resource requirements.
- Guess Too High? You waste money (Slack resources). Your cluster runs at 10% utilization.
- Guess Too Low? Your application crashes with
OOMKilled(Out Of Memory) or gets CPU throttled.
Vertical Pod Autoscaler (VPA) solves this by automatically adjusting the requests and limits based on historical usage.
1. How VPA Works: The Components
VPA is not built-in to the core controller manager like HPA. It runs as a separate deployment (often installed via Helm).
- Recommender: Watches historical usage (from Metrics Server or Prometheus) and calculates the “ideal” request.
- Updater: Checks if running pods have requests that deviate significantly from the recommendation. If so, it evicts them.
- Admission Controller: When the pod is recreated (by the Deployment controller), the Admission Controller injects the new, correct resource requests.
2. Interactive: The “Right-Sizing” visualizer
Observe how VPA adjusts the Request Line (Red Dashed) to match the Actual Usage (Blue Wave).
3. VPA Modes: From Safe to Disruptive
You can configure how aggressive VPA is via updateMode.
1. Off (Dry Run)
VPA calculates recommendations but applies nothing.
- Use Case: Safe exploration. View recommendations via
kubectl describe vpa.
2. Initial
VPA only applies changes when a pod is created. It will never evict a running pod.
- Use Case: Stateful applications where restarts are expensive.
3. Auto (The Default)
VPA will evict running pods if their requests are significantly wrong.
- Use Case: Stateless web servers.
- Warning: This causes restarts! Ensure you have
PodDisruptionBudgetsconfigured.
4. Recreate
Rarely used. Similar to Auto but respects PDBs less strictly.
4. The Conflict: VPA vs HPA
[!CAUTION] The Feedback Loop of Death Do NOT use HPA and VPA on the same metric (CPU/Memory).
Imagine HPA targets 50% CPU, and VPA targets “Right Sizing”.
- Traffic Spikes: CPU goes to 80%.
- HPA Reacts: “CPU is high! Add replicas!” → Adds 2 pods.
- VPA Reacts: “CPU is high! Increase request size!” → Increases request from 100m to 200m.
- Result: The new pods are now LARGER, consuming even more cluster capacity. The HPA sees utilization drop (because VPA increased the denominator
requests), so it might scale down, then scale up again.
Solution:
- Use HPA for CPU/Memory scaling.
- Use VPA only in
Offmode for recommendations, OR - Use VPA only for memory (if HPA is on CPU).
5. Implementation
VPA Manifest
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: nginx-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: nginx
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 100m
memory: 50Mi
maxAllowed:
cpu: 1
memory: 500Mi
Checking Recommendations
Once VPA is running, check its status:
kubectl describe vpa nginx-vpa
Output:
Recommendation:
Container Recommendations:
Container Name: nginx
Lower Bound: CPU: 25m, Memory: 262144k
Target: CPU: 25m, Memory: 262144k
Uncapped Target: CPU: 25m, Memory: 262144k
Upper Bound: CPU: 1, Memory: 500Mi
- Lower Bound: Minimum resources needed.
- Target: The recommended value.
- Upper Bound: Maximum resources VPA will ever set.