Logging: The EFK Stack
In a traditional VM environment, logs are written to /var/log/app.log and rotate daily.
In Kubernetes, Pods are ephemeral. If a Pod crashes and is restarted, its logs are lost forever unless you ship them somewhere else.
[!IMPORTANT] The EFK Stack is the standard open-source solution:
- Elasticsearch: Stores and indexes logs.
- Fluentd (or Fluent Bit): Collects and ships logs.
- Kibana: Visualizes logs.
1. The Architecture: Node-Level Logging
The most common pattern is DaemonSet Logging.
- Application: Writes logs to
stdout/stderr. - Docker/Containerd: Captures these streams and writes to
/var/log/containers/*.logon the Node. - Fluentd: Runs as a DaemonSet (one per Node), tails these files, parses them, and sends them to Elasticsearch.
2. Structured Logging (JSON)
If your app logs plain text:
2023-10-27 10:00:00 INFO User logged in id=123
Elasticsearch treats this as a single string. Searching for user_id: 123 is hard.
If your app logs JSON:
{"timestamp": "2023-10-27T10:00:00Z", "level": "INFO", "message": "User logged in", "user_id": 123}
Elasticsearch indexes user_id as a field. You can now filter, aggregate, and visualize efficiently.
3. Interactive: The Log Pipeline
Visualize how a raw log line is transformed by Fluentd filters into a structured document.
1. Raw Log (stdout)
2. Fluentd Parser
/^\[(?<time>[^\]]*)\] \[(?<level>[^\]]*)\] (?<msg>.*)$/
3. Elasticsearch Doc
4. Structured Logging Code Examples
Ideally, your application should output JSON directly, skipping the complex regex parsing step.
Go (Using Zap)
Uber’s Zap logger is high-performance and structured.
import "go.uber.org/zap"
func main() {
logger, _ := zap.NewProduction()
defer logger.Sync()
// Key-Value pairs become JSON fields
logger.Info("failed to fetch URL",
zap.String("url", "http://example.com"),
zap.Int("attempt", 3),
zap.Duration("backoff", time.Second),
)
}
// Output: {"level":"info","ts":159493,"msg":"failed to fetch URL","url":"http://example.com","attempt":3,"backoff":1}
Java (Using Logback + LogstashEncoder)
Add logstash-logback-encoder to your pom.xml.
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
</appender>
</configuration>
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import static net.logstash.logback.argument.StructuredArguments.kv;
public class MyService {
private static final Logger logger = LoggerFactory.getLogger(MyService.class);
public void process(String userId) {
logger.info("Processing user", kv("user_id", userId), kv("status", "active"));
}
}
// Output: {"@timestamp":"...","message":"Processing user","user_id":"123","status":"active",...}
5. DaemonSet vs. Sidecar
| Strategy | DaemonSet | Sidecar |
|---|---|---|
| Concept | One agent per Node | One agent per Pod |
| Resource Usage | Low (Shared) | High (Duplicated) |
| Complexity | Simple (Standard) | Complex (Manifest changes) |
| Use Case | Standard stdout logs |
Legacy apps writing to files on disk |
6. Summary
- Centralize: Never rely on
kubectl logs. Ship them to Elasticsearch. - Structure: Log in JSON to make logs queryable.
- DaemonSet: Use Fluentd/Fluent Bit as a DaemonSet to collect logs efficiently from all pods on a node.