Aggregations: Real-Time Analytics

[!NOTE] This module explores the core principles of Aggregations: Real-Time Analytics, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Pivot: From Search to SQL Group By

Search finds needles. Aggregations describe the haystack.

SQL: SELECT type, AVG(price) FROM products GROUP BY type
Elasticsearch: One request matches documents AND builds the summary table.

Benefit: You get the “Search Result List” AND the “Faceted Sidebar” (Price ranges, Categories) in 1 query.

2. The Anatomy of an Aggregation

Every Aggregation has two main types:

A. Buckets (The “Group By”)

Creates bins of documents.

terms: Group by “Category”.
date_histogram: Group by “Month”.
range: Group by “Price > 100”.

B. Metrics (The “Select”)

Calculates numbers inside a bucket.

avg, sum, min, max.
cardinality (Approximate Distinct Count - HyperLogLog).

C. Pipeline Aggregations (The “Having”)

Input is another aggregation, not documents.

derivative: Calculate rate of change.
moving_avg: Smooth out noise.

3. Interactive: The Aggregation Tree

Visualize how docs flow into buckets and compute metrics.

Documents

Buckets (Terms: Color)

Metric (Avg Price)

4. Hardware Reality: Global Ordinals

How does ES group by strings (“Color”) so fast? It replaces "Red" with an integer 1 and "Blue" with 2. This mapping (Global Ordinals) is built lazily. Warning: The first aggregation on a high-cardinality keyword field is slow (it has to build this map). Pre-load it: Use eager_global_ordinals in your mapping if you rely on low-latency aggregations.