Monitoring with Prometheus
In a monolithic world, you could just SSH into a server and run top.
In Kubernetes, with 500 pods appearing and disappearing dynamically, this is impossible.
[!IMPORTANT] Prometheus is the standard for Kubernetes monitoring. Unlike traditional systems that wait for agents to Push data, Prometheus Pulls (scrapes) metrics from your applications.
1. The Architecture: Pull vs. Push
Traditional Push Model (e.g., Datadog, NewRelic)
- Agent: Runs on the host or inside the app.
- Action: Sends data to a central server.
- Pros: Good for short-lived jobs, easy to setup behind firewalls.
- Cons: The agent can overwhelm the server (DDoS yourself).
Prometheus Pull Model
- Application: Exposes an HTTP endpoint (usually
/metrics). - Prometheus: Scrapes this endpoint every interval (e.g., 15s).
- Pros: Prometheus controls the load. If the app is down, the scrape fails (built-in up/down check).
- Service Discovery: Prometheus queries the Kubernetes API to find new Pods automatically.
2. Dimensional Metrics & PromQL
Prometheus stores data as Time Series. Each series is identified by a metric name and a set of key-value pairs called Labels.
Example: http_requests_total{method="POST", handler="/api/checkout", status="200"}
This allows for powerful querying using PromQL (Prometheus Query Language).
Interactive: PromQL Simulator
Visualize how PromQL functions transform raw counter data into meaningful rates.
Query
3. Instrumentation (Go & Java)
To make your application visible to Prometheus, you must instrument it.
Go (Using promhttp)
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
opsProcessed = prometheus.NewCounter(prometheus.CounterOpts{
Name: "myapp_processed_ops_total",
Help: "The total number of processed events",
})
)
func main() {
// Record metrics
opsProcessed.Inc()
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":2112", nil)
}
Java (Using Micrometer)
Micrometer is the “SLF4J for metrics”.
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
@Service
public class MyService {
private final Counter requestCounter;
public MyService(MeterRegistry registry) {
this.requestCounter = Counter.builder("myapp_requests_total")
.description("Total requests")
.tag("region", "us-east-1")
.register(registry);
}
public void handleRequest() {
requestCounter.increment();
}
}
4. Alerting with AlertManager
Prometheus is not just for graphs; it’s for alerting.
You define rules in YAML:
groups:
- name: example
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status="500"}[5m]) > 0.5
for: 10m
labels:
severity: page
annotations:
summary: "High error rate detected"
If the expression expr is true for 10m, AlertManager fires an alert to Slack, PagerDuty, or Email.
5. Summary
- Pull Model: Prometheus scrapes
/metricsendpoints. - Time Series: Data is stored as
Metric + Labels + Time + Value. - PromQL: Powerful language to aggregate and analyze metrics (Rate, Sum, Histogram).
- Instrumentation: Use standard libraries (Micrometer, Prometheus Client) to expose metrics.