Module Review: Production
[!NOTE] This module explores the core principles of Module Review: Production, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Key Takeaways
- Deployment Patterns:
- DaemonSet: One per node. Saves memory and context switches. Best for metrics/logs.
- Gateway: Centralized cluster. Best for sampling, auth, and buffering.
- Sidecar: One per pod. Expensive (RAM/CPU). Use only for strict isolation.
- Performance Tuning:
- Memory Limiter: Must be the first processor to prevent OOM.
- Go Runtime: Set
GOMEMLIMITto 90% of container memory to force aggressive GC before OOM. - Batch Processor: Optimizes network I/O but introduces latency.
- Sampling:
- Head-Based: Cheap, probabilistic.
- Tail-Based: Expensive (buffers everything), but guarantees capturing errors.
2. Interactive Flashcards
Why is the DaemonSet pattern generally preferred over the Sidecar pattern for large clusters?
(Click to reveal)
Resource Efficiency
DaemonSets run one process per node, sharing overhead. Sidecars run one process per pod, leading to huge memory waste (100 pods = 100 idle collectors) and increased kernel context switching.
What happens if you place the batch processor before the memory_limiter?
(Click to reveal)
Risk of OOM Crash
The batch processor will buffer data in memory. If it receives a huge spike, it might consume all available RAM before the memory limiter even sees the data to reject it.
What is the purpose of setting the GOMEMLIMIT environment variable for the Collector?
(Click to reveal)
Prevent OOM Kills
It tells the Go runtime to run Garbage Collection more aggressively as heap usage approaches this limit, preventing the OS from killing the container due to memory overuse.
What is the primary trade-off of using Tail Sampling?
(Click to reveal)
High Memory Usage
The collector must hold every single span in memory until the trace is complete (or times out) to make a decision. This requires significant RAM capacity.
3. Production Cheat Sheet
| Component | Setting | Recommendation |
|---|---|---|
| Go Runtime | GOMEMLIMIT |
90% of container hard limit. Critical for Go 1.19+. |
| Memory Limiter | limit_mib |
80% of container hard limit. |
| Memory Limiter | spike_limit_mib |
20% of container hard limit. |
| Batch Processor | timeout |
200ms - 1s. Higher = better compression. |
| Batch Processor | send_batch_size |
1000 - 8192. Tuned to MTU. |
| Exporter | retry_on_failure |
Enabled. Max elapsed time 300s. |
| Receivers | otlp |
Enable grpc (4317) and http (4318). |
| Deployment | DaemonSet |
Default choice for K8s (One per node). |
| Deployment | Sidecar |
Use only for strict tenant isolation. |
4. Next Steps
You have mastered the Production module! Ensure you verify your knowledge with the project implementation.