DaemonSets

[!NOTE] This module explores the core principles of DaemonSets, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

The Problem: Imagine you manage a Kubernetes cluster with 1,000 nodes. You need to collect logs from every single node using an agent like Fluentd, and you must ensure network monitoring software runs everywhere to detect intrusions. If you use a standard Deployment, the scheduler might randomly place 5 logging agents on Node A and 0 on Node B. How do you guarantee exactly one instance of an application runs on every single machine?

A DaemonSet is the solution. It ensures that all (or a subset of) Nodes run a copy of a specific Pod. As new nodes are added to the cluster, Pods are automatically added to them. As nodes are removed, those Pods are safely garbage collected.

The Analogy: Think of a Deployment like assigning waiters in a large restaurant—they roam around to wherever the busy tables are. A DaemonSet, on the other hand, is like a smoke detector. Building code mandates exactly one smoke detector per room, no matter how many rooms you add or remove.

1. Use Cases

When do you fundamentally need a DaemonSet instead of a Deployment? Whenever the workload is tied to the infrastructure (the node) rather than the application scale.

Cluster Storage Daemons: Running a storage daemon (like glusterd or ceph) on every node to pool local storage into a cluster-wide filesystem.
Logs Collection: Running a logs collector (like fluentd or logstash) on every node to aggregate container stdout/stderr streams.
Node Monitoring: Running a node monitoring daemon (like Prometheus Node Exporter, collectd, or Datadog agent) to scrape hardware metrics (CPU, Memory, Disk IO) from the host machine.
Networking/Security Proxies: Running networking overlay components (like kube-proxy or Calico/Cilium agents) to manage node-level IP routing and firewall rules.

War Story: In a major streaming platform, an engineer once accidentally deployed the logging agent (fluentd) as a Deployment with 50 replicas instead of a DaemonSet on a 500-node cluster. The scheduler packed multiple loggers onto a few nodes, starving them of memory and causing cascading Out-Of-Memory (OOM) crashes, while 450 nodes had zero log collection. Switching to a DaemonSet immediately stabilized the cluster by ensuring a strict 1:1 Pod-to-Node ratio.

2. Interactive: Node Lifecycle Simulator

Watch how the DaemonSet Controller reacts to new nodes. Unlike a Deployment, you don’t scale replicas manually; you scale nodes. The DaemonSet automatically reconciles to ensure one pod per node.

Node 1

DaemonSet Pod

Cluster Stable.

3. DaemonSet vs Deployment

To solidify the difference, let’s look at a concrete comparison:

Feature	Deployment	DaemonSet
Replica Control	You define exact count (`replicas: 3`).	You don’t define replicas; it scales with Nodes.
Placement	Scheduler decides (could be all on one Node).	Exactly one pod per matching Node.
Primary Use Case	Stateless applications (web servers, APIs).	Node-level agents (logging, monitoring, networking).
Failure Handling	Reschedules pod to a healthy Node if a Node dies.	Pod dies with the Node; ignores the pod until Node recovers.

4. Anatomy of a DaemonSet Manifest

The YAML definition looks incredibly similar to a Deployment, but with kind: DaemonSet and critically, no replicas field.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      tolerations:
      # Use this to run on Master nodes (control plane)
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd-elasticsearch
        image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi

5. Update Strategy: Rolling Update

Just like Deployments, DaemonSets support Rolling Updates. However, since there is only one pod per node, Kubernetes must explicitly delete the old pod before creating the new one on that specific node to avoid port conflicts and resource double-booking.

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

maxUnavailable: The maximum number of DaemonSet pods that can be unavailable during the update. Default is 1. This ensures that during a logging agent upgrade, you only lose logs from one node at a time.

6. Taints and Tolerations (Breaking the Rules)

Usually, the Kubernetes scheduler actively avoids placing standard application pods on control plane nodes (often called Master nodes) because these nodes are protected with Taints.

However, system daemons like networking plugins or cluster-wide loggers must run everywhere, including the control plane. To bypass a taint, the DaemonSet must explicitly Tolerate it.

Common Taint: node-role.kubernetes.io/master:NoSchedule

Toleration (included in the spec.template.spec):

tolerations:
- key: node-role.kubernetes.io/master
  effect: NoSchedule

This tells the scheduler: “I acknowledge this node is restricted to control plane workloads, but I am allowed to run here anyway.”

DaemonSets

DaemonSets

1. Use Cases

2. Interactive: Node Lifecycle Simulator

3. DaemonSet vs Deployment

4. Anatomy of a DaemonSet Manifest

5. Update Strategy: Rolling Update

6. Taints and Tolerations (Breaking the Rules)

Found this lesson helpful?