Aggregations: Real-Time Analytics

[!NOTE] This module explores the core principles of Aggregations: Real-Time Analytics, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Pivot: From Search to SQL Group By

Search finds needles. Aggregations describe the haystack.

  • SQL: SELECT type, AVG(price) FROM products GROUP BY type
  • Elasticsearch: One request matches documents AND builds the summary table.

Benefit: You get the “Search Result List” AND the “Faceted Sidebar” (Price ranges, Categories) in 1 query.


2. The Anatomy of an Aggregation

Every Aggregation has two main types:

A. Buckets (The “Group By”)

Creates bins of documents.

  • terms: Group by “Category”.
  • date_histogram: Group by “Month”.
  • range: Group by “Price > 100”.

B. Metrics (The “Select”)

Calculates numbers inside a bucket.

  • avg, sum, min, max.
  • cardinality (Approximate Distinct Count - HyperLogLog).

C. Pipeline Aggregations (The “Having”)

Input is another aggregation, not documents.

  • derivative: Calculate rate of change.
  • moving_avg: Smooth out noise.

3. Interactive: The Aggregation Tree

Visualize how docs flow into buckets and compute metrics.

Documents

Buckets (Terms: Color)

Metric (Avg Price)


4. Hardware Reality: Global Ordinals

How does ES group by strings (“Color”) so fast? It replaces "Red" with an integer 1 and "Blue" with 2. This mapping (Global Ordinals) is built lazily. Warning: The first aggregation on a high-cardinality keyword field is slow (it has to build this map). Pre-load it: Use eager_global_ordinals in your mapping if you rely on low-latency aggregations.