Module Review: Metrics

Key Takeaways

  • Metrics vs Traces: Traces track individual requests (high cardinality, sampling needed). Metrics track aggregates (low cardinality, 100% accurate trends).
  • Three Pillars:
  • Counter: Only goes up. Use for rates (RPS, Error Rate).
  • Gauge: Goes up/down. Use for snapshots (Memory, Queue Depth).
  • Histogram: Distribution buckets. Use for latency and sizes.
  • Cardinality Rule: NEVER put high-cardinality data (IDs, emails) in metric attributes. It will crash your time-series database.
  • Pipeline: App (OTel SDK) → Exporter → Prometheus (Storage) → Grafana (Visualization).

Interactive Flashcards

📊

Which metric type should I use for "Total HTTP 500 Errors"?

Think: Does it ever go down?

Counter

Errors are events that accumulate. You want to count them monotonically.

📉

Which metric type should I use for "Current CPU Usage"?

It fluctuates up and down.

Gauge

CPU usage is a state at a specific point in time. It is not cumulative.

⚠️

What is the "Cardinality Explosion"?

Why can't I add UserID to my metric?

Too many unique series

Adding unbounded attributes (like IDs) creates millions of unique time series, exhausting memory/storage.

⏱️

Why OTel Histograms over Summaries?

Think about aggregation.

Aggregatable

Histograms (buckets) can be merged across multiple service instances. Summaries cannot be mathematically merged.

Metrics Cheat Sheet

Feature Counter Gauge Histogram
Direction Up Only Up & Down Distribution
Data Type Cumulative Sum Current Value Buckets (Counts)
Aggregation Rate (per sec) Last / Avg P99, P50, Max
Example http_requests_total jvm_memory_used http_request_duration
PromQL rate(x[5m]) x histogram_quantile(...)

Next Steps

Now that you have metrics and traces, you are missing the final piece: Logs.

👉 Trace-Correlated Logs

OpenTelemetry Glossary