Logging: The EFK Stack

In a traditional VM environment, logs are written to /var/log/app.log and rotate daily. In Kubernetes, Pods are ephemeral. If a Pod crashes and is restarted, its logs are lost forever unless you ship them somewhere else.

[!IMPORTANT] The EFK Stack is the standard open-source solution:

  • Elasticsearch: Stores and indexes logs.
  • Fluentd (or Fluent Bit): Collects and ships logs.
  • Kibana: Visualizes logs.

1. The Architecture: Node-Level Logging

The most common pattern is DaemonSet Logging.

  1. Application: Writes logs to stdout / stderr.
  2. Docker/Containerd: Captures these streams and writes to /var/log/containers/*.log on the Node.
  3. Fluentd: Runs as a DaemonSet (one per Node), tails these files, parses them, and sends them to Elasticsearch.

2. Structured Logging (JSON)

If your app logs plain text: 2023-10-27 10:00:00 INFO User logged in id=123

Elasticsearch treats this as a single string. Searching for user_id: 123 is hard.

If your app logs JSON: {"timestamp": "2023-10-27T10:00:00Z", "level": "INFO", "message": "User logged in", "user_id": 123}

Elasticsearch indexes user_id as a field. You can now filter, aggregate, and visualize efficiently.

3. Interactive: The Log Pipeline

Visualize how a raw log line is transformed by Fluentd filters into a structured document.

1. Raw Log (stdout)

[2023-10-27 14:00:00] [INFO] Request processed in 45ms

2. Fluentd Parser (Regex)

/^\[(?<time>[^\]]*)\] \[(?<level>[^\]]*)\] (?<msg>.*)$/

3. Elasticsearch Doc (JSON)

(Waiting for parser...)

4. Structured Logging Code Examples

Ideally, your application should output JSON directly, skipping the complex regex parsing step.

Go (Using Zap)

Uber’s Zap logger is high-performance and structured.

import "go.uber.org/zap"

func main() {
  logger, _ := zap.NewProduction()
  defer logger.Sync()

  // Key-Value pairs become JSON fields
  logger.Info("failed to fetch URL",
    zap.String("url", "http://example.com"),
    zap.Int("attempt", 3),
    zap.Duration("backoff", time.Second),
  )
}
// Output: {"level":"info","ts":159493,"msg":"failed to fetch URL","url":"http://example.com","attempt":3,"backoff":1}

Java (Using Logback + LogstashEncoder)

Add logstash-logback-encoder to your pom.xml.

<configuration>
  <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
  </appender>
</configuration>
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import static net.logstash.logback.argument.StructuredArguments.kv;

public class MyService {
  private static final Logger logger = LoggerFactory.getLogger(MyService.class);

  public void process(String userId) {
    logger.info("Processing user", kv("user_id", userId), kv("status", "active"));
  }
}
// Output: {"@timestamp":"...","message":"Processing user","user_id":"123","status":"active",...}

5. DaemonSet vs. Sidecar

Strategy DaemonSet Sidecar
Concept One agent per Node One agent per Pod
Resource Usage Low (Shared) High (Duplicated)
Complexity Simple (Standard) Complex (Manifest changes)
Use Case Standard stdout logs Legacy apps writing to files on disk

6. Summary

  • Centralize: Never rely on kubectl logs. Ship them to Elasticsearch.
  • Structure: Log in JSON to make logs queryable.
  • DaemonSet: Use Fluentd/Fluent Bit as a DaemonSet to collect logs efficiently from all pods on a node.