Logging: The EFK Stack
In a traditional VM environment, logs are written to /var/log/app.log and rotate daily.
In Kubernetes, Pods are ephemeral. If a Pod crashes and is restarted, its logs are lost forever unless you ship them somewhere else.
[!IMPORTANT] The EFK Stack is the standard open-source solution:
- Elasticsearch: Stores and indexes logs.
- Fluentd (or Fluent Bit): Collects and ships logs.
- Kibana: Visualizes logs.
1. The Architecture: Node-Level Logging
The most common pattern is DaemonSet Logging.
- Application: Writes logs to
stdout/stderr. - Docker/Containerd: Captures these streams and writes to
/var/log/containers/*.logon the Node. - Fluentd: Runs as a DaemonSet (one per Node), tails these files, parses them, and sends them to Elasticsearch.
2. Structured Logging (JSON)
If your app logs plain text:
2023-10-27 10:00:00 INFO User logged in id=123
Elasticsearch treats this as a single string. Searching for user_id: 123 is hard.
If your app logs JSON:
{"timestamp": "2023-10-27T10:00:00Z", "level": "INFO", "message": "User logged in", "user_id": 123}
Elasticsearch indexes user_id as a field. You can now filter, aggregate, and visualize efficiently.
3. Interactive: The Log Pipeline
Visualize how a raw log line is transformed by Fluentd filters into a structured document.
1. Raw Log (stdout)
2. Fluentd Parser (Regex)
3. Elasticsearch Doc (JSON)
4. Structured Logging Code Examples
Ideally, your application should output JSON directly, skipping the complex regex parsing step.
Go (Using Zap)
Uber’s Zap logger is high-performance and structured.
import "go.uber.org/zap"
func main() {
logger, _ := zap.NewProduction()
defer logger.Sync()
// Key-Value pairs become JSON fields
logger.Info("failed to fetch URL",
zap.String("url", "http://example.com"),
zap.Int("attempt", 3),
zap.Duration("backoff", time.Second),
)
}
// Output: {"level":"info","ts":159493,"msg":"failed to fetch URL","url":"http://example.com","attempt":3,"backoff":1}
Java (Using Logback + LogstashEncoder)
Add logstash-logback-encoder to your pom.xml.
<configuration>
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder"/>
</appender>
</configuration>
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import static net.logstash.logback.argument.StructuredArguments.kv;
public class MyService {
private static final Logger logger = LoggerFactory.getLogger(MyService.class);
public void process(String userId) {
logger.info("Processing user", kv("user_id", userId), kv("status", "active"));
}
}
// Output: {"@timestamp":"...","message":"Processing user","user_id":"123","status":"active",...}
5. DaemonSet vs. Sidecar
| Strategy | DaemonSet | Sidecar |
|---|---|---|
| Concept | One agent per Node | One agent per Pod |
| Resource Usage | Low (Shared) | High (Duplicated) |
| Complexity | Simple (Standard) | Complex (Manifest changes) |
| Use Case | Standard stdout logs |
Legacy apps writing to files on disk |
6. Summary
- Centralize: Never rely on
kubectl logs. Ship them to Elasticsearch. - Structure: Log in JSON to make logs queryable.
- DaemonSet: Use Fluentd/Fluent Bit as a DaemonSet to collect logs efficiently from all pods on a node.