Query DSL: Speaking Jewish

[!NOTE] This module explores the core principles of Query DSL: Speaking Jewish, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Two Contexts: Score vs No-Score

Every clause in Elasticsearch runs in one of two contexts. Mixing them up is the #1 cause of slow clusters.

Feature Query Context ("query": ...) Filter Context ("filter": ...)
Question “How well does this match?” “Does this match? (Yes/No)”
Output _score (Float) Boolean (True/False)
Performance Slower (Calculates Relevance) Fast (Cached in BitSet)
Use Case Full-text search (“best pizza”) Exact filtering (“status=active”)

Golden Rule: If you don’t care about ranking (e.g., filtering by Date, Status, ID), ALWAYS use Filter Context.


2. The Compound bool Query

The bool query is the wrapper for combining logic. It has 4 clauses:

  1. must (AND): Must match. Contributes to score.
  2. filter (AND): Must match. Ignores score. Cached.
  3. should (OR): Nice to have. Boosts score if present.
  4. must_not (NOT): Must NOT match. Ignores score. Cached.

Pattern:

{
  "query": {
  "bool": {
    "must": [ { "match": { "title": "pizza" }} ],  // Calculate score
    "filter": [ { "term": { "city": "NYC" }} ]     // Cached BitSet
  }
  }
}

3. Interactive: The BitSet Cache

Elasticsearch caches Filters using BitSets (Arrays of 0s and 1s). See how intersecting queries works.

Doc IDs:
Filter: "status=active"
Filter: "category=tech"
Result (Bitwise AND)

4. Hardware Reality: CPU Instructions

Why are Filter Contexts so fast? Elasticsearch uses Roaring Bitmaps. It doesn’t just loop through arrays. It uses SIMD (Single Instruction, Multiple Data) CPU instructions to AND together thousands of bits in a single CPU cycle. 1 Filter = 1 BitSet lookup. Fast as light. Query Context = Floating point math. Slow.