Module Review: Metrics

Key Takeaways

Metrics vs Traces: Traces track individual requests (high cardinality, sampling needed). Metrics track aggregates (low cardinality, 100% accurate trends).
Three Pillars:
Counter: Only goes up. Use for rates (RPS, Error Rate).
Gauge: Goes up/down. Use for snapshots (Memory, Queue Depth).
Histogram: Distribution buckets. Use for latency and sizes.
Cardinality Rule: NEVER put high-cardinality data (IDs, emails) in metric attributes. It will crash your time-series database.
Pipeline: App (OTel SDK) → Exporter → Prometheus (Storage) → Grafana (Visualization).

📊

Think: Does it ever go down?

Counter

Errors are events that accumulate. You want to count them monotonically.

📉

It fluctuates up and down.

Gauge

CPU usage is a state at a specific point in time. It is not cumulative.

⚠️

Why can't I add UserID to my metric?

Too many unique series

Adding unbounded attributes (like IDs) creates millions of unique time series, exhausting memory/storage.

⏱️

Think about aggregation.

Aggregatable

Histograms (buckets) can be merged across multiple service instances. Summaries cannot be mathematically merged.

Feature	Counter	Gauge	Histogram
Direction	Up Only	Up & Down	Distribution
Data Type	Cumulative Sum	Current Value	Buckets (Counts)
Aggregation	Rate (per sec)	Last / Avg	P99, P50, Max
Example	`http_requests_total`	`jvm_memory_used`	`http_request_duration`
PromQL	`rate(x[5m])`	`x`	`histogram_quantile(...)`

Now that you have metrics and traces, you are missing the final piece: Logs.