Module Review: Metrics
Key Takeaways
- Metrics vs Traces: Traces track individual requests (high cardinality, sampling needed). Metrics track aggregates (low cardinality, 100% accurate trends).
- Three Pillars:
- Counter: Only goes up. Use for rates (RPS, Error Rate).
- Gauge: Goes up/down. Use for snapshots (Memory, Queue Depth).
- Histogram: Distribution buckets. Use for latency and sizes.
- Cardinality Rule: NEVER put high-cardinality data (IDs, emails) in metric attributes. It will crash your time-series database.
- Pipeline: App (OTel SDK) → Exporter → Prometheus (Storage) → Grafana (Visualization).
Interactive Flashcards
📊
Which metric type should I use for "Total HTTP 500 Errors"?
Think: Does it ever go down?
Counter
Errors are events that accumulate. You want to count them monotonically.
📉
Which metric type should I use for "Current CPU Usage"?
It fluctuates up and down.
Gauge
CPU usage is a state at a specific point in time. It is not cumulative.
⚠️
What is the "Cardinality Explosion"?
Why can't I add UserID to my metric?
Too many unique series
Adding unbounded attributes (like IDs) creates millions of unique time series, exhausting memory/storage.
⏱️
Why OTel Histograms over Summaries?
Think about aggregation.
Aggregatable
Histograms (buckets) can be merged across multiple service instances. Summaries cannot be mathematically merged.
Metrics Cheat Sheet
| Feature | Counter | Gauge | Histogram |
|---|---|---|---|
| Direction | Up Only | Up & Down | Distribution |
| Data Type | Cumulative Sum | Current Value | Buckets (Counts) |
| Aggregation | Rate (per sec) | Last / Avg | P99, P50, Max |
| Example | http_requests_total |
jvm_memory_used |
http_request_duration |
| PromQL | rate(x[5m]) |
x |
histogram_quantile(...) |
Next Steps
Now that you have metrics and traces, you are missing the final piece: Logs.