Metrics Server
The Metrics Server is the heart of autoscaling. Without it, HPA and VPA are blind.
It is a cluster-wide aggregator of resource usage data. It collects metrics like CPU and Memory usage from each node and exposes them via the Kubernetes API server.
1. Architecture: The Data Flow
How does a CPU spike get to the HPA?
- cAdvisor: Running inside every Kubelet, it collects raw usage data from the container runtime (Docker/Containerd).
- Scrape: Metrics Server polls every node (usually every 60s) to fetch this data.
- Serve: It registers itself as an Aggregated API (
metrics.k8s.io). - Consume: HPA queries the API Server, which proxies the request to Metrics Server.
2. Key Limitations
[!IMPORTANT] No History Metrics Server stores ZERO history. It only knows the “current” usage (last 60s). If you need historical data (e.g., “What was CPU usage last week?”), you need Prometheus.
[!WARNING] Resolution The default resolution is 60 seconds. This means HPA takes at least 1 minute to “see” a spike. You can lower this (e.g.,
--metric-resolution=15s), but it increases load on the API server.
3. Installation
Metrics Server is not installed by default in many clusters (like EKS or self-managed).
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Verification
If installed correctly, these commands will work:
# Check Node Usage
kubectl top nodes
# Output:
# NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
# minikube 250m 12% 850Mi 10%
# Check Pod Usage
kubectl top pods
# Output:
# NAME CPU(cores) MEMORY(bytes)
# nginx-pod 10m 50Mi
4. Troubleshooting
“Error: Metrics API not available”
Usually means the Aggregation Layer isn’t configured on the API Server, or the Metrics Server pod is crashing.
“Readiness Probe Failed”
Metrics Server needs to talk to Kubelets. If you are using a custom CNI or have firewall rules, ensure port 10250 (Kubelet API) is open.
Self-Signed Certificates
In test environments (like Minikube), Metrics Server might fail to validate Kubelet certs. You can disable this (INSECURE):
# Add this arg to the Deployment
containers:
- args:
- --kubelet-insecure-tls