Scoring: The Math of Relevance (BM25)

[!NOTE] This module explores the core principles of Scoring: The Math of Relevance (BM25), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Magic Number: _score

When you search for “Best Pizza”, Elasticsearch returns documents sorted by _score. Where does this number come from? It comes from BM25 (Best Match 25), the evolution of Classic TF-IDF.


2. The 3 Variables of Scoring

A. TF (Term Frequency)

“How many times does ‘Pizza’ appear in this doc?”

  • More is better.
  • BM25 Saturation: In Classic TF-IDF, 100 occurrences was 10x better than 10. In BM25, the benefit saturates (levels off). 100 is only slightly better than 10.

B. IDF (Inverse Document Frequency)

“How rare is the word ‘Pizza’ in the whole index?”

  • Rare words (‘Arugula’) are worth MORE than common words (‘Pizza’).
  • Stopwords (‘The’, ‘And’) are worth almost zero.

C. Field Length Norm

“How long is the field?”

  • Finding ‘Pizza’ in a Tweet (Short) is a stronger signal than finding it in a Book (Long).
  • Short fields get a boost.

3. Interactive: BM25 Calculator

Adjust the sliders to see how the score changes.

How often terms appear in doc
Rarity (Lower = Rare = Higher Score)
Shorter fields score higher

Final Score

0.00

4. Tuning Relevance (Boosts)

Don’t like the defaults?

  1. Field Boost: "title^3" (Title matches are 3x more important).
  2. Function Score: Multiply score by popularity or recency.
  3. Rescoring: Run a cheap query first, then run a heavy scoring script on the Top 50 results.