Databases & Scaling

Welcome to the Databases & Scaling module.

At the Staff level, we stop asking “Postgres or Mongo?” and start asking “How does this system handle Network Partitions?” and “What is the operational cost of Resharding?”

Module Structure

1. Replication Models

Leader-Follower: Async vs Sync, and the “Monotonic Read” problem.
Leaderless (Dynamo): Quorums (W+R > N), Read Repair, and Anti-Entropy.
Interactive: Quorum Calculator.

2. Sharding & Partitioning

Partitioning Strategies: Range (TiKV) vs Hash (Cassandra).
Operational Pain: The “Resharding Storm” and how Virtual Nodes (vnodes) solve data skew.
Interactive: Consistent Hashing Simulator.

3. Consistency & CAP/PACELC

CAP is a Lie: Why you can’t “choose CA”.
PACELC: The real trade-off (Latency vs Consistency) in healthy systems.
Models: Linearizability vs Serializability vs Eventual.

Key Takeaway

Database scaling is a game of trade-offs. You can have strong consistency, but you pay for it in latency (PACELC). You can have infinite write scale, but you pay for it in complexity (Sharding).

Module Chapters

Chapter 1

Databases 101: Excel to Postgres

Why can't we just save everything in a text file? Learn how databases keep your data organized, searchable, and safe from crashes.

Start Learning →

Chapter 2

Sharding: The Partitioning Nightmare

Scaling horizontally by splitting data. Horizontal vs Vertical partitioning, and avoiding the 'Hot Spot' problem.

Start Learning →

Chapter 3

Consistency & CAP: The Hard Truths

Understanding the trade-offs of distributed state. Strong vs Eventual consistency and the truth about CAP vs PACELC.

Start Learning →

Chapter 4

Replication: Leader vs. Leaderless

How to keep data in sync across nodes. Synchronous vs Asynchronous replication, and the cost of the 'Lag'.

Start Learning →