Ingestion: Pipelines & Grok
[!NOTE] This module explores the core principles of Ingestion: Pipelines & Grok, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Architecture Shift: Logstash \to Ingest Nodes
Old Way:
App → Filebeat → Logstash (Heavy Java Process) → Elasticsearch
- Pros: Powerful.
- Cons: Another cluster to manage. Single point of failure.
New Way (Ingest Pipelines):
App → Filebeat → Elasticsearch (Ingest Node)
- Pros: Serverless ETL. Runs inside ES.
- Mechanism: Before indexing, the document passes through a chain of Processors.
2. The Power of Grok
Unstructured logs are useless.
"2023-10-01 12:00:00 ERROR [Auth] Failed login"
Grok turns this into JSON using Regex patterns.
%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} \[%{DATA:module}\] %{GREEDYDATA:msg}
3. Interactive: Grok Debugger
Test your patterns against raw logs.
Structured Output (JSON)
{}
4. Hardware Reality: Ingest CPU
Ingest pipelines run on the write path.
- Heavy Regex (Grok) = High CPU usage.
- Impact: Slows down indexing throughput.
- Solution: Use Dedicated Ingest Nodes (
node.roles: [ingest]). Keep them separate from Data/Master nodes so your search performance doesn’t degrade during log spikes.