Module 5: Metrics

Mission: Move beyond “Is it up?” to “How is it performing?”.

Welcome to the Metrics module. While traces allow you to debug individual requests, metrics allow you to understand the aggregate health of your system.

If traces are the microscope, metrics are the dashboard of your car. You don’t look at the engine firing every millisecond (trace); you look at the speedometer (metric).

1. The Power of Aggregation

In a high-throughput system processing 10,000 requests per second, you cannot store a trace for every request. It’s too expensive.

But you can store a metric for every request.

  • Counter: “We served 10,000 requests.” (1 data point)
  • Histogram: “99% of requests were under 200ms.” (1 data point)

Metrics are cheap, fast, and essential for alerting.

2. What You Will Learn

  1. Metrics Basics
    • Master the three core instruments: Counter, Gauge, and Histogram.
    • Understand the difference between UpAndDownCounter and Gauge.
    • Learn how to visualize these in Grafana.
  2. Module Review
    • Test your knowledge with flashcards.
    • Quick reference cheat sheet for metric instruments.

3. The Cardinality Trap

One of the most dangerous pitfalls in observability is High Cardinality. If you add a user_id tag to your metrics, and you have 1 million users, you just created 1 million time series. This will crash your Prometheus server.

[!WARNING] Cardinality is the Silent Killer. We will explore exactly how to avoid this trap and how to use Exemplars to link metrics to traces without blowing up your storage.

4. Prerequisites

  • Module 1: Foundations: You should understand what a Metric is conceptually.
  • Module 2: Zero to Tracing: You should have a running OTel Collector and backend (Prometheus/Grafana).

Ready? Let’s start counting with Metrics Basics.