Production Deployment Strategies
Deploying OpenTelemetry in production requires balancing observability fidelity against infrastructure cost and operational complexity.
In this chapter, we dissect the three primary deployment patterns, analyzing them from the kernel up to the cloud bill.
1. The Three Pillars of OTel Deployment
- Agent / Sidecar: A Collector instance runs alongside each application instance.
- DaemonSet: A single Collector instance runs on each node (host).
- Gateway: A centralized cluster of Collectors that aggregates data.
Interactive Architecture Visualizer
Select your infrastructure constraints to visualize the recommended data flow.
Deployment Architecture Visualizer
Toggle infrastructure type and scale to see the data path.
2. Deep Dive: Kernel Reality
Why choose DaemonSet over Sidecar? It’s not just about “convenience”—it’s about hardware efficiency.
The Cost of a Sidecar
In a Sidecar model, every application Pod has a helper container (the Collector).
- Context Switching: The Linux kernel must schedule both the App and the Sidecar. If you have 50 pods on a node, that’s 50 extra processes fighting for CPU time slices.
- Memory Overhead: Even an idle Go process consumes ~10-20MB (RSS). Multiplied by 100 pods, that’s 1-2GB of RAM wasted just on idle collectors.
- Loopback Traffic: Data travels via
localhost(loopback interface). While fast, it still traverses the kernel’s TCP/IP stack, incurring serialization/deserialization costs.
The DaemonSet Advantage
- Shared Resources: One process per Node. 100 pods share 1 collector.
- Host Network: The DaemonSet can run in the host network namespace, allowing it to scrape
kubeletand hardware metrics directly without permission gymnastics.
[!TIP] Performance Rule: Use DaemonSets by default. Use Sidecars only if you need strict isolation (e.g., multi-tenant clusters where Team A cannot see Team B’s spans) or specialized transformation logic per app.
3. Implementing the Sidecar Pattern
Sometimes you do need a Sidecar (e.g., for FaaS or strict isolation). In Kubernetes, this is typically done via a MutatingAdmissionWebhook.
Here is how you would programmatically inject a sidecar in Go. This is simplified logic similar to what the OTel Operator does.
package main
import (
"encoding/json"
"net/http"
admissionv1 "k8s.io/api/admission/v1"
corev1 "k8s.io/api/core/v1"
)
// sidecarContainer definition
var sidecarContainer = corev1.Container{
Name: "otel-collector",
Image: "otel/opentelemetry-collector-contrib:0.88.0",
Args: []string{"--config=/conf/config.yaml"},
// Limits are critical for sidecars to avoid starving the app
Resources: corev1.ResourceRequirements{
Limits: corev1.ResourceList{
"memory": resource.MustParse("100Mi"),
"cpu": resource.MustParse("200m"),
},
},
}
func mutatePod(w http.ResponseWriter, r *http.Request) {
// 1. Decode the AdmissionReview request
var admissionReview admissionv1.AdmissionReview
if err := json.NewDecoder(r.Body).Decode(&admissionReview); err != nil {
http.Error(w, "Invalid request", 400); return
}
// 2. Create the Patch to add the container
// In reality, you'd use a library to generate JSONPatch
patch := `[{"op": "add", "path": "/spec/containers/-", "value": ` + toJSON(sidecarContainer) + `}]`
// 3. Construct the response
admissionResponse := &admissionv1.AdmissionResponse{
UID: admissionReview.Request.UID,
Allowed: true,
Patch: []byte(patch),
PatchType: func() *admissionv1.PatchType {
pt := admissionv1.PatchTypeJSONPatch
return &pt
}(),
}
// Send response...
}
4. Total Cost of Ownership (TCO) Calculator
Which architecture is cheaper? Use this calculator to estimate the monthly infrastructure cost.
Infrastructure Cost Estimator
Sidecar Cost
DaemonSet Cost
* Assumptions: Sidecar uses 50MB RAM ($2.00/GB/mo). DaemonSet uses 500MB RAM.
5. Summary
- DaemonSets are the default choice for Kubernetes. They save massive amounts of RAM and CPU context switches compared to Sidecars.
- Gateways become necessary when you need to centralized authentication, sampling, or buffer data before sending to a SaaS vendor.
- Sidecars are a niche optimization for strict isolation, not a default deployment strategy.
Next, we will look at how to tune these collectors so they don’t crash under load.