Database Trees
Database Trees
[!NOTE] This module explores the core principles of Database Trees, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Conflict: Read vs. Write
Databases handle two main workloads:
- OLTP (Transactions): Fast lookups, row updates. (SQL).
- OLAP/Big Data: Massive ingestion, analytics. (NoSQL/Cassandra).
The data structure you choose determines which workload you survive.
2. B-Trees (The SQL Standard)
Used by: PostgreSQL, MySQL (InnoDB), Oracle.
- Structure: A short, fat tree. High fan-out (100+ keys per node).
- Why?: Minimizes Disk Seeks. If the tree height is 3, you only need 3 disk jumps to find any record in billions.
- Pro: Blazing fast Reads (O(\logB N)).
- Con: Random Writes. Updating a key might require jumping to a random disk sector to update the leaf. Bad for heavy write loads.
3. LSM Trees (The Write-Heavy Hero)
Used by: Cassandra, RocksDB, LevelDB.
- Log-Structured Merge Tree.
- Mechanism:
- Writes go to an in-memory buffer (MemTable) → Instant RAM speed.
- When full, flush to disk as a sorted file (SSTable) → Sequential Write (Fast!).
- Reads checks MemTable, then SSTables.
- Pro: Insane Write Throughput (Sequential I/O).
- Con: Slower Reads (Need to check multiple files/levels).
4. Hardware Truth: Random vs. Sequential I/O
Why does this matter?
- Sequential Write: ~500 MB/s (HDD/SSD). The head just spins and writes.
- Random Write: ~1 MB/s (HDD). The disk head must physically move (seek) to the sector.
- Impact: LSM Trees turn random database updates into sequential log appends, making them 10s or 100s of times faster for writing than B-Trees.
5. Deep Dive Strategy Lab: Database Trees
Intuition Through Analogy
Think of this chapter like running a high-traffic consumer app. The goal is not to memorize a fixed trick, but to repeatedly answer:
- What is the bottleneck?
- Which constraint dominates (time, memory, latency, correctness)?
- Which representation makes the bottleneck easier to eliminate?