Observability Fundamentals

Goal: Understand the data structures that power observability: Traces, Metrics, and Logs.

In the previous chapter, we saw why we need observability. Now, let’s look at what it actually is.

Observability is built on three pillars. Alone, they are useful. Together, they are powerful.

1. Traces: The Request Journey

A Trace tells the story of a request. It tracks the path of execution across multiple services.

A Trace is a tree of Spans.

  • Trace: The entire request (e.g., “Checkout”).
  • Span: An individual operation (e.g., “DB Query”, “HTTP Call”).

Interactive: Anatomy of a Trace

Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736
/checkout (Frontend) 350ms
/auth (Auth Service) 70ms
/charge (Payment Service) 210ms
INSERT INTO orders (DB) 140ms

Code Example: Creating a Span

import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;

Tracer tracer = GlobalOpenTelemetry.getTracer("my-service");

Span span = tracer.spanBuilder("process_payment").startSpan();
try {
  // Make the current span active
  try (Scope scope = span.makeCurrent()) {
    processPayment(); // Your logic here
    span.setAttribute("payment.amount", 99.99);
  }
} catch (Exception e) {
  span.recordException(e);
  throw e;
} finally {
  span.end(); // CRITICAL: Always end the span!
}
import (
  "context"
  "go.opentelemetry.io/otel"
  "go.opentelemetry.io/otel/attribute"
)

func processPayment(ctx context.Context) {
  tracer := otel.Tracer("my-service")

  // Start a new span
  ctx, span := tracer.Start(ctx, "process_payment")
  defer span.End() // CRITICAL: Always end the span!

  // Your logic here
  span.SetAttributes(attribute.Float64("payment.amount", 99.99))
}

2. Metrics: The Pulse

Metrics are aggregations over time. They are cheap to store and great for spotting trends.

Common metric types:

  • Counter: A value that only goes up (e.g., http.requests.total).
  • Gauge: A value that goes up and down (e.g., process.memory.usage).
  • Histogram: A distribution of values (e.g., http.server.duration).

Code Example: Recording a Metric

Meter meter = GlobalOpenTelemetry.getMeter("my-service");

LongCounter requestCounter = meter
  .counterBuilder("http_requests_total")
  .setDescription("Total HTTP requests")
  .build();

requestCounter.add(1, Attributes.of(AttributeKey.stringKey("endpoint"), "/checkout"));
meter := otel.Meter("my-service")

requestCounter, _ := meter.Int64Counter(
  "http_requests_total",
  metric.WithDescription("Total HTTP requests"),
)

requestCounter.Add(ctx, 1, metric.WithAttributes(
  attribute.String("endpoint", "/checkout"),
))

3. Logs: The Details

Logs in OpenTelemetry are structured. Instead of just a text line, a Log Record contains:

  • Timestamp
  • Severity
  • Body (message)
  • Attributes (key-value pairs)
  • Trace Context (Trace ID + Span ID)

[!TIP] Correlation is Magic. Because OTel logs contain the Trace ID, you can view a Trace and instantly see all logs generated by all services during that specific request. No more grepping through 10 different log files!

4. Context Propagation

How does the Payment Service know it belongs to the same trace as the Checkout Service? Context Propagation.

OTel injects HTTP headers into outgoing requests. The standard is W3C Trace Context.

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

  • 00: Version
  • 4bf...: Trace ID (128-bit) - Unique for the whole request.
  • 00f...: Parent Span ID (64-bit) - The caller’s span ID.
  • 01: Flags (Sampled or not).

Interactive: Context Propagator

Service A
Trace ID
Service B

5. Summary

  1. Traces connect the dots across services.
  2. Metrics show the health trends.
  3. Logs provide the detailed events.
  4. Context Propagation is the glue that binds them together.

In the next module, we will start instrumenting an application from scratch.