Observability Fundamentals
Goal: Understand the data structures that power observability: Traces, Metrics, and Logs.
In the previous chapter, we saw why we need observability. Now, let’s look at what it actually is.
Observability is built on three pillars. Alone, they are useful. Together, they are powerful.
1. Traces: The Request Journey
A Trace tells the story of a request. It tracks the path of execution across multiple services.
A Trace is a tree of Spans.
- Trace: The entire request (e.g., “Checkout”).
- Span: An individual operation (e.g., “DB Query”, “HTTP Call”).
Interactive: Anatomy of a Trace
Code Example: Creating a Span
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
Tracer tracer = GlobalOpenTelemetry.getTracer("my-service");
Span span = tracer.spanBuilder("process_payment").startSpan();
try {
// Make the current span active
try (Scope scope = span.makeCurrent()) {
processPayment(); // Your logic here
span.setAttribute("payment.amount", 99.99);
}
} catch (Exception e) {
span.recordException(e);
throw e;
} finally {
span.end(); // CRITICAL: Always end the span!
}
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
)
func processPayment(ctx context.Context) {
tracer := otel.Tracer("my-service")
// Start a new span
ctx, span := tracer.Start(ctx, "process_payment")
defer span.End() // CRITICAL: Always end the span!
// Your logic here
span.SetAttributes(attribute.Float64("payment.amount", 99.99))
}
2. Metrics: The Pulse
Metrics are aggregations over time. They are cheap to store and great for spotting trends.
Common metric types:
- Counter: A value that only goes up (e.g.,
http.requests.total). - Gauge: A value that goes up and down (e.g.,
process.memory.usage). - Histogram: A distribution of values (e.g.,
http.server.duration).
Code Example: Recording a Metric
Meter meter = GlobalOpenTelemetry.getMeter("my-service");
LongCounter requestCounter = meter
.counterBuilder("http_requests_total")
.setDescription("Total HTTP requests")
.build();
requestCounter.add(1, Attributes.of(AttributeKey.stringKey("endpoint"), "/checkout"));
meter := otel.Meter("my-service")
requestCounter, _ := meter.Int64Counter(
"http_requests_total",
metric.WithDescription("Total HTTP requests"),
)
requestCounter.Add(ctx, 1, metric.WithAttributes(
attribute.String("endpoint", "/checkout"),
))
3. Logs: The Details
Logs in OpenTelemetry are structured. Instead of just a text line, a Log Record contains:
- Timestamp
- Severity
- Body (message)
- Attributes (key-value pairs)
- Trace Context (Trace ID + Span ID)
[!TIP] Correlation is Magic. Because OTel logs contain the Trace ID, you can view a Trace and instantly see all logs generated by all services during that specific request. No more grepping through 10 different log files!
4. Context Propagation
How does the Payment Service know it belongs to the same trace as the Checkout Service? Context Propagation.
OTel injects HTTP headers into outgoing requests. The standard is W3C Trace Context.
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
00: Version4bf...: Trace ID (128-bit) - Unique for the whole request.00f...: Parent Span ID (64-bit) - The caller’s span ID.01: Flags (Sampled or not).
Interactive: Context Propagator
5. Summary
- Traces connect the dots across services.
- Metrics show the health trends.
- Logs provide the detailed events.
- Context Propagation is the glue that binds them together.
In the next module, we will start instrumenting an application from scratch.