Module Review: Production

[!NOTE] This module explores the core principles of Module Review: Production, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Key Takeaways

  • Deployment Patterns:
  • DaemonSet: One per node. Saves memory and context switches. Best for metrics/logs.
  • Gateway: Centralized cluster. Best for sampling, auth, and buffering.
  • Sidecar: One per pod. Expensive (RAM/CPU). Use only for strict isolation.
  • Performance Tuning:
  • Memory Limiter: Must be the first processor to prevent OOM.
  • Go Runtime: Set GOMEMLIMIT to 90% of container memory to force aggressive GC before OOM.
  • Batch Processor: Optimizes network I/O but introduces latency.
  • Sampling:
  • Head-Based: Cheap, probabilistic.
  • Tail-Based: Expensive (buffers everything), but guarantees capturing errors.

2. Interactive Flashcards

DaemonSet vs Sidecar

Why is the DaemonSet pattern generally preferred over the Sidecar pattern for large clusters?

Resource Efficiency

DaemonSets run one process per node, sharing overhead. Sidecars run one process per pod, leading to huge memory waste (100 pods = 100 idle collectors) and increased kernel context switching.

Pipeline Order

What happens if you place the batch processor before the memory_limiter?

Risk of OOM Crash

The batch processor will buffer data in memory. If it receives a huge spike, it might consume all available RAM before the memory limiter even sees the data to reject it.

GOMEMLIMIT

What is the purpose of setting the GOMEMLIMIT environment variable for the Collector?

Prevent OOM Kills

It tells the Go runtime to run Garbage Collection more aggressively as heap usage approaches this limit, preventing the OS from killing the container due to memory overuse.

Tail Sampling Cost

What is the primary trade-off of using Tail Sampling?

High Memory Usage

The collector must hold every single span in memory until the trace is complete (or times out) to make a decision. This requires significant RAM capacity.

3. Production Cheat Sheet

Component Setting Recommendation
Go Runtime GOMEMLIMIT 90% of container hard limit. Critical for Go 1.19+.
Memory Limiter limit_mib 80% of container hard limit.
Memory Limiter spike_limit_mib 20% of container hard limit.
Batch Processor timeout 200ms - 1s. Higher = better compression.
Batch Processor send_batch_size 1000 - 8192. Tuned to MTU.
Exporter retry_on_failure Enabled. Max elapsed time 300s.
Receivers otlp Enable grpc (4317) and http (4318).
Deployment DaemonSet Default choice for K8s (One per node).
Deployment Sidecar Use only for strict tenant isolation.

4. Next Steps

You have mastered the Production module! Ensure you verify your knowledge with the project implementation.