The Collector: Architecture & Deployment
The OpenTelemetry Collector is the single most critical component in a production observability stack. While the OTel SDKs generate data, the Collector is what makes that data manageable at scale.
[!IMPORTANT] Why do you need this? Without a Collector, every microservice in your fleet is tightly coupled to your backend vendor (Datadog, Honeycomb, New Relic). If you want to switch vendors, scrub PII, or reduce data volume, you have to redeploy every single service.
With a Collector, your applications just send data to “localhost”, and the Collector handles the rest. It is your vendor-agnostic control plane.
In this module, we will deconstruct the Collector’s architecture, build a robust production configuration from scratch, and simulate how data flows through its internal pipeline.
1. Why use a Collector?
You can configure the OTel SDK in your application to send data directly to a backend. This is called the “Direct-to-Vendor” approach, and it is almost always a mistake for production systems.
| Feature | Direct Export (Bad) | With Collector (Good) |
|---|---|---|
| Coupling | Apps know backend credentials | Apps only know “localhost” |
| Data Control | Hard to filter/redact centrally | Centralized PII scrubbing |
| Network | Many connections to vendor | Batched, compressed, persistent connections |
| Sampling | Head-based only (limited) | Tail-based sampling (powerful) |
| Migration | Redeploy all apps to switch vendors | Update Collector config only |
[!TIP] Pro Tip: Even in development, run a local Collector. It allows you to tee traffic to a local console exporter for debugging without changing your application code.
2. Interactive Pipeline Simulator
Visualize how data flows through the Collector. Toggle “Batching” to see how spans are grouped, and “Filtering” to simulate dropping noise.
3. Architecture Deep Dive
The Collector is essentially an ETL pipeline built on three primary components: Receivers, Processors, and Exporters.