Module Review: The Collector
Key Takeaways
- Decoupling: The Collector decouples your applications from backend vendors, allowing you to switch vendors without redeploying apps.
- ETL Pipeline: The Collector is an ETL pipeline: Receivers (Input) → Processors (Transform) → Exporters (Output).
- Order Matters: Processors run sequentially. Always put the Memory Limiter first to prevent crashes, followed by Batching for performance.
- Deployment: Use DaemonSets (Agents) for logs/host metrics and Deployments (Gateways) for traces/sampling.
- Sampling: Use Head-based sampling for simple volume reduction and Tail-based sampling to keep critical traces (errors/latency) while dropping noise.
Interactive Flashcards
What are the 3 main components of a Collector?
Tap to flipReceivers, Processors, Exporters
Receivers ingest data, Processors transform it, and Exporters send it to backends.
Why must the Memory Limiter be the first processor?
Tap to flipTo prevent OOM crashes
It monitors memory usage and drops data if it exceeds safe limits, protecting the process.
What is the difference between Agent and Gateway deployment?
Tap to flipScope & Scale
Agent runs on every node (DaemonSet) for local collection. Gateway runs as a centralized cluster (Deployment) for processing and sampling.
What is Tail-Based Sampling?
Tap to flipPost-Trace Decision
The Collector buffers the entire trace and makes a sampling decision based on the complete data (e.g., "Keep all errors").
Collector Cheat Sheet
| Component | Purpose | Key Configuration |
|---|---|---|
| OTLP Receiver | Ingests data via gRPC/HTTP | protocols: grpc: endpoint: 0.0.0.0:4317 |
| Memory Limiter | Prevents Out-Of-Memory crashes | limit_mib: 1024, spike_limit_mib: 256 |
| Batch Processor | Batches data to reduce network IO | send_batch_size: 1024, timeout: 10s |
| Resource Processor | Adds metadata (env, cluster) | attributes: [{key: env, value: prod, action: insert}] |
| Attributes Processor | Filters or modifies attributes | actions: [{key: user.id, action: delete}] |
| OTLP Exporter | Sends to vendor/collector | endpoint: "api.vendor.com:4317", headers: {api-key: ...} |
| Logging Exporter | Debugging (prints to stdout) | verbosity: detailed |
Next Steps
Now that you’ve mastered the Collector, it’s time to learn how to intelligently reduce your data volume without losing visibility.
Start Module 08: Sampling Strategies
[!NOTE] Need a refresher on terms? Check the OpenTelemetry Glossary.