Aggregations: Real-Time Analytics
[!NOTE] This module explores the core principles of Aggregations: Real-Time Analytics, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Pivot: From Search to SQL Group By
Search finds needles. Aggregations describe the haystack.
- SQL:
SELECT type, AVG(price) FROM products GROUP BY type - Elasticsearch: One request matches documents AND builds the summary table.
Benefit: You get the “Search Result List” AND the “Faceted Sidebar” (Price ranges, Categories) in 1 query.
2. The Anatomy of an Aggregation
Every Aggregation has two main types:
A. Buckets (The “Group By”)
Creates bins of documents.
terms: Group by “Category”.date_histogram: Group by “Month”.range: Group by “Price > 100”.
B. Metrics (The “Select”)
Calculates numbers inside a bucket.
avg,sum,min,max.cardinality(Approximate Distinct Count - HyperLogLog).
C. Pipeline Aggregations (The “Having”)
Input is another aggregation, not documents.
derivative: Calculate rate of change.moving_avg: Smooth out noise.
3. Interactive: The Aggregation Tree
Visualize how docs flow into buckets and compute metrics.
Documents
Buckets (Terms: Color)
Metric (Avg Price)
4. Hardware Reality: Global Ordinals
How does ES group by strings (“Color”) so fast?
It replaces "Red" with an integer 1 and "Blue" with 2.
This mapping (Global Ordinals) is built lazily.
Warning: The first aggregation on a high-cardinality keyword field is slow (it has to build this map).
Pre-load it: Use eager_global_ordinals in your mapping if you rely on low-latency aggregations.