Monitoring with Prometheus

In a monolithic world, you could just SSH into a server and run top. In Kubernetes, with 500 pods appearing and disappearing dynamically, this is impossible.

[!IMPORTANT] Prometheus is the standard for Kubernetes monitoring. Unlike traditional systems that wait for agents to Push data, Prometheus Pulls (scrapes) metrics from your applications.

1. The Architecture: Pull vs. Push

Traditional Push Model (e.g., Datadog, NewRelic)

  • Agent: Runs on the host or inside the app.
  • Action: Sends data to a central server.
  • Pros: Good for short-lived jobs, easy to setup behind firewalls.
  • Cons: The agent can overwhelm the server (DDoS yourself).

Prometheus Pull Model

  • Application: Exposes an HTTP endpoint (usually /metrics).
  • Prometheus: Scrapes this endpoint every interval (e.g., 15s).
  • Pros: Prometheus controls the load. If the app is down, the scrape fails (built-in up/down check).
  • Service Discovery: Prometheus queries the Kubernetes API to find new Pods automatically.

2. Dimensional Metrics & PromQL

Prometheus stores data as Time Series. Each series is identified by a metric name and a set of key-value pairs called Labels.

Example: http_requests_total{method="POST", handler="/api/checkout", status="200"}

This allows for powerful querying using PromQL (Prometheus Query Language).

Interactive: PromQL Simulator

Visualize how PromQL functions transform raw counter data into meaningful rates.

Query

Showing raw counter values. Notice they only go up (monotonic).

3. Instrumentation (Go & Java)

To make your application visible to Prometheus, you must instrument it.

Go (Using promhttp)

package main

import (
  "net/http"
  "github.com/prometheus/client_golang/prometheus"
  "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
  opsProcessed = prometheus.NewCounter(prometheus.CounterOpts{
    Name: "myapp_processed_ops_total",
    Help: "The total number of processed events",
  })
)

func main() {
  // Record metrics
  opsProcessed.Inc()

  http.Handle("/metrics", promhttp.Handler())
  http.ListenAndServe(":2112", nil)
}

Java (Using Micrometer)

Micrometer is the “SLF4J for metrics”.

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;

@Service
public class MyService {
  private final Counter requestCounter;

  public MyService(MeterRegistry registry) {
    this.requestCounter = Counter.builder("myapp_requests_total")
      .description("Total requests")
      .tag("region", "us-east-1")
      .register(registry);
  }

  public void handleRequest() {
    requestCounter.increment();
  }
}

4. Alerting with AlertManager

Prometheus is not just for graphs; it’s for alerting.

You define rules in YAML:

groups:
- name: example
  rules:
  - alert: HighErrorRate
  expr: rate(http_requests_total{status="500"}[5m]) > 0.5
  for: 10m
  labels:
    severity: page
  annotations:
    summary: "High error rate detected"

If the expression expr is true for 10m, AlertManager fires an alert to Slack, PagerDuty, or Email.

5. Summary

  • Pull Model: Prometheus scrapes /metrics endpoints.
  • Time Series: Data is stored as Metric + Labels + Time + Value.
  • PromQL: Powerful language to aggregate and analyze metrics (Rate, Sum, Histogram).
  • Instrumentation: Use standard libraries (Micrometer, Prometheus Client) to expose metrics.