Why Search Needs Elasticsearch
[!NOTE] This module explores the core principles of Why Search Needs Elasticsearch, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Problem: The SQL “LIKE” Trap
You have a users table with 100 million rows. You want to find users named “John”.
SELECT * FROM users WHERE name = 'John';
Fast (O(\log N)). The B-Tree index jumps straight to “John”.
Now you want to find users with “John” in their bio.
SELECT * FROM users WHERE bio LIKE '%John%';
Slow (O(N)). The B-Tree is useless for middle-of-string matches. The DB must scan every single row. On a generic SSD, scanning 100GB of data takes ~50 seconds. The user is long gone.
2. Hardware Reality: Random vs Sequential I/O
Why is the B-Tree bad for text search?
- Text is Messy: We search for tokens (“John”), not exact strings.
- Pointer Chasing: B-Trees require jumping around disk (Random I/O) to follow pointers.
Latency Numbers:
- Sequential Read: ~500 MB/s (Fast)
- Random Read: ~100 MB/s (Slow)
Elasticsearch is designed to turn Random Search problems into Sequential Scan problems.
3. The Solution: The Inverted Index
Instead of mapping Row ID → Text, we map Word → List<Row IDs>.
Document 1: “The quick brown fox” Document 2: “The quick blue fox”
Inverted Index:
- “blue”:
[2] - “brown”:
[1] - “fox”:
[1, 2] - “quick”:
[1, 2] - “the”:
[1, 2]
To find “quick fox”:
- Get list for “quick”:
[1, 2] - Get list for “fox”:
[1, 2] - Intersect:
[1, 2](Math! Not Disk Scan!)
This intersection happens in Memory (or using CPU-optimized bitmaps), making it orders of magnitude faster than a disk scan.
4. Interactive: SQL Scan vs Inverted Index
Click Run to see the difference.
- SQL (Left): Must check every row (Red).
- Elasticsearch (Right): Jumps straight to the IDs (Green).
SQL Table Scan (O(N))
Inverted Index Lookup (O(1))
5. Summary
- SQL = Row-oriented. Good for Transaction (
UPDATE user SET ...). Bad for Search. - Elasticsearch = Term-oriented (Inverted Index). Bad for Transactions. God-tier for Search.
- Physics: We trade Write Speed (updating index is immutable/expensive) for Read Speed (intersections are fast).