The Structures That Hold The World
When you do SELECT * FROM users WHERE id = 5, what happens?
The database doesn’t scan the whole disk. It uses a Tree. But not a Binary Tree.
1. The Hardware Constraint: Disk I/O
- RAM Access: Nanoseconds.
- Disk Access: Milliseconds (1,000,000x slower).
- Block Size: Disks read data in blocks (e.g., 4KB or 16KB).
A standard Binary Tree spreads nodes across memory. Reading Node → Left → Left might trigger 3 disk seeks. Too slow.
2. B-Trees and B+Trees
A B-Tree is a “fat” tree. Each node contains many keys (e.g., 100) and many children.
- Height: Very low. A height-3 B-Tree can store millions of rows.
- Locality: One node = One Disk Page. Reading a node gets 100 keys in 1 seek.
B+Tree (The Standard)
Used by MySQL (InnoDB) and PostgreSQL.
- Internal Nodes: Only keys (for routing).
- Leaf Nodes: Actual Data. Linked together for fast range scans.
3. Interactive: B+Tree Split
Visualize how a node “splits” when full, pushing the median up.
Tree Empty.
4. LSM Trees (Log-Structured Merge)
Used by Cassandra, RocksDB, Kafka.
- Problem: B-Trees do random writes (slow on HDD/SSD).
- Solution: Write sequentially to an append-only log (MemTable).
- Flush: When RAM is full, flush to disk as a sorted file (SSTable).
- Merge: Background process merges SSTables.
[!TIP] Read vs Write:
- B-Tree: Balanced. Good read, decent write.
- LSM Tree: Fast write (append-only), slower read (check MemTable + SSTables).