Monitoring with Prometheus

In a monolithic world, you could just SSH into a server and run top. In Kubernetes, with 500 pods appearing and disappearing dynamically, this is impossible.

Important

Prometheus is the standard for Kubernetes monitoring. Unlike traditional systems that wait for agents to Push data, Prometheus Pulls (scrapes) metrics from your applications.

1. The Architecture: Pull vs. Push

Traditional Push Model (e.g., Datadog, NewRelic)

  • Agent: Runs on the host or inside the app.
  • Action: Sends data to a central server.
  • Pros: Good for short-lived jobs, easy to setup behind firewalls.
  • Cons: The agent can overwhelm the server (DDoS yourself).

Prometheus Pull Model

  • Application: Exposes an HTTP endpoint (usually /metrics).
  • Prometheus: Scrapes this endpoint every interval (e.g., 15s).
  • Pros: Prometheus controls the load. If the app is down, the scrape fails (built-in up/down check).
  • Service Discovery: Prometheus queries the Kubernetes API to find new Pods automatically.

2. Dimensional Metrics & PromQL

Prometheus stores data as Time Series. Each series is identified by a metric name and a set of key-value pairs called Labels.

Example: http_requests_total{method="POST", handler="/api/checkout", status="200"}

This allows for powerful querying using PromQL (Prometheus Query Language).

Interactive: PromQL Simulator

Visualize how PromQL functions transform raw counter data into meaningful rates.

PromQL Query Simulator

Showing raw counter values. Notice they only go up (monotonic).

3. Instrumentation (Go & Java)

To make your application visible to Prometheus, you must instrument it.

Go (Using promhttp)

package main

import (
  "net/http"
  "github.com/prometheus/client_golang/prometheus"
  "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
  opsProcessed = prometheus.NewCounter(prometheus.CounterOpts{
    Name: "myapp_processed_ops_total",
    Help: "The total number of processed events",
  })
)

func main() {
  // Record metrics
  opsProcessed.Inc()

  http.Handle("/metrics", promhttp.Handler())
  http.ListenAndServe(":2112", nil)
}

Java (Using Micrometer)

Micrometer is the “SLF4J for metrics”.

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;

@Service
public class MyService {
  private final Counter requestCounter;

  public MyService(MeterRegistry registry) {
    this.requestCounter = Counter.builder("myapp_requests_total")
      .description("Total requests")
      .tag("region", "us-east-1")
      .register(registry);
  }

  public void handleRequest() {
    requestCounter.increment();
  }
}

4. Alerting with AlertManager

Prometheus is not just for graphs; it’s for alerting.

You define rules in YAML:

groups:
- name: example
  rules:
  - alert: HighErrorRate
  expr: rate(http_requests_total{status="500"}[5m]) > 0.5
  for: 10m
  labels:
    severity: page
  annotations:
    summary: "High error rate detected"

If the expression expr is true for 10m, AlertManager fires an alert to Slack, PagerDuty, or Email.

5. Summary

  • Pull Model: Prometheus scrapes /metrics endpoints.
  • Time Series: Data is stored as Metric + Labels + Time + Value.
  • PromQL: Powerful language to aggregate and analyze metrics (Rate, Sum, Histogram).
  • Instrumentation: Use standard libraries (Micrometer, Prometheus Client) to expose metrics.