Why Search Needs Elasticsearch

[!NOTE] This module explores the core principles of Why Search Needs Elasticsearch, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Problem: The SQL “LIKE” Trap

You have a users table with 100 million rows. You want to find users named “John”.

SELECT * FROM users WHERE name = 'John';

Fast (O(\log N)). The B-Tree index jumps straight to “John”.

Now you want to find users with “John” in their bio.

SELECT * FROM users WHERE bio LIKE '%John%';

Slow (O(N)). The B-Tree is useless for middle-of-string matches. The DB must scan every single row. On a generic SSD, scanning 100GB of data takes ~50 seconds. The user is long gone.


2. Hardware Reality: Random vs Sequential I/O

Why is the B-Tree bad for text search?

  1. Text is Messy: We search for tokens (“John”), not exact strings.
  2. Pointer Chasing: B-Trees require jumping around disk (Random I/O) to follow pointers.

Latency Numbers:

  • Sequential Read: ~500 MB/s (Fast)
  • Random Read: ~100 MB/s (Slow)

Elasticsearch is designed to turn Random Search problems into Sequential Scan problems.


3. The Solution: The Inverted Index

Instead of mapping Row ID &rarr; Text, we map Word &rarr; List<Row IDs>.

Document 1: “The quick brown fox” Document 2: “The quick blue fox”

Inverted Index:

  • “blue”: [2]
  • “brown”: [1]
  • “fox”: [1, 2]
  • “quick”: [1, 2]
  • “the”: [1, 2]

To find “quick fox”:

  1. Get list for “quick”: [1, 2]
  2. Get list for “fox”: [1, 2]
  3. Intersect: [1, 2] (Math! Not Disk Scan!)

This intersection happens in Memory (or using CPU-optimized bitmaps), making it orders of magnitude faster than a disk scan.


4. Interactive: SQL Scan vs Inverted Index

Click Run to see the difference.

  • SQL (Left): Must check every row (Red).
  • Elasticsearch (Right): Jumps straight to the IDs (Green).

SQL Table Scan (O(N))

Checked: 0

Inverted Index Lookup (O(1))

Term: "Fox" → IDs: [4, 12, 45, 88]
Direct Hits: 0

5. Summary

  • SQL = Row-oriented. Good for Transaction (UPDATE user SET ...). Bad for Search.
  • Elasticsearch = Term-oriented (Inverted Index). Bad for Transactions. God-tier for Search.
  • Physics: We trade Write Speed (updating index is immutable/expensive) for Read Speed (intersections are fast).