Module Review: Production
[!NOTE] This module explores the core principles of Module Review: Production, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Key Takeaways
- Deployment Patterns:
- DaemonSet: One per node. Saves memory and context switches. Best for metrics/logs.
- Gateway: Centralized cluster. Best for sampling, auth, and buffering.
- Sidecar: One per pod. Expensive (RAM/CPU). Use only for strict isolation.
- Performance Tuning:
- Memory Limiter: Must be the first processor to prevent OOM.
- Go Runtime: Set
GOMEMLIMITto 90% of container memory to force aggressive GC before OOM. - Batch Processor: Optimizes network I/O but introduces latency.
- Sampling:
- Head-Based: Cheap, probabilistic.
- Tail-Based: Expensive (buffers everything), but guarantees capturing errors.
2. Interactive Flashcards
DaemonSet vs Sidecar
Why is the DaemonSet pattern generally preferred over the Sidecar pattern for large clusters?
Resource Efficiency
DaemonSets run one process per node, sharing overhead. Sidecars run one process per pod, leading to huge memory waste (100 pods = 100 idle collectors) and increased kernel context switching.
Pipeline Order
What happens if you place the batch processor before the memory_limiter?
Risk of OOM Crash
The batch processor will buffer data in memory. If it receives a huge spike, it might consume all available RAM before the memory limiter even sees the data to reject it.
GOMEMLIMIT
What is the purpose of setting the GOMEMLIMIT environment variable for the Collector?
Prevent OOM Kills
It tells the Go runtime to run Garbage Collection more aggressively as heap usage approaches this limit, preventing the OS from killing the container due to memory overuse.
Tail Sampling Cost
What is the primary trade-off of using Tail Sampling?
High Memory Usage
The collector must hold every single span in memory until the trace is complete (or times out) to make a decision. This requires significant RAM capacity.
3. Production Cheat Sheet
| Component | Setting | Recommendation |
|---|---|---|
| Go Runtime | GOMEMLIMIT |
90% of container hard limit. Critical for Go 1.19+. |
| Memory Limiter | limit_mib |
80% of container hard limit. |
| Memory Limiter | spike_limit_mib |
20% of container hard limit. |
| Batch Processor | timeout |
200ms - 1s. Higher = better compression. |
| Batch Processor | send_batch_size |
1000 - 8192. Tuned to MTU. |
| Exporter | retry_on_failure |
Enabled. Max elapsed time 300s. |
| Receivers | otlp |
Enable grpc (4317) and http (4318). |
| Deployment | DaemonSet |
Default choice for K8s (One per node). |
| Deployment | Sidecar |
Use only for strict tenant isolation. |
4. Next Steps
You have mastered the Production module! Ensure you verify your knowledge with the project implementation.