
Replication gives you Read Scalability. Partitioning (Sharding) gives you Write Scalability and Infinite Storage.
But it comes at a cost. “Resharding” a live 100TB database is one of the most dangerous operations in engineering.
1. Partitioning Strategies
How do we decide which node holds User:101?
A. Range Partitioning (Google BigTable, HBase, TiKV)
Assign keys by range: [A-C] -> Node 1, [D-F] -> Node 2.
- Pros: Efficient Range Scans (
SELECT * WHERE name LIKE 'A%'). - Cons: Hot Spots. If everyone registers with ‘A’, Node 1 melts.
B. Hash Partitioning (Cassandra, DynamoDB)
Assign keys by hash: hash(key) % N.
- Pros: Uniform distribution. No hot spots (usually).
- Cons: Range Scans are impossible. You must query all N nodes.
C. Consistent Hashing (The Industry Standard)
Instead of mod N, we map both Nodes and Keys to a Circle (0-360°).
- Placement: A key belongs to the first Node found clockwise on the ring.
- Adding a Node: Only keys between the New Node and the Previous Node move.
- Modulo Hashing: Reshuffles 100% of keys.
- Consistent Hashing: Reshuffles $1/N$ of keys.
2. Interactive Sharding Simulator
Visualize why Virtual Nodes (VNodes) are critical. Without VNodes, if you have 3 nodes, one node might get a huge arc (50% of data) by luck. VNodes break physical nodes into many small slices to average out the variance.
3. The Resharding Storm
You have 3 nodes. You add a 4th.
- Ideally: Node 4 takes 25% of load from Nodes 1, 2, and 3.
- Reality: Node 4 is empty. It has to stream terabytes of data from 1, 2, and 3.
- Network bandwidth saturates.
- Disk I/O spikes.
- Queries timeout.
Staff Insight: Never introduce a new shard to a live cluster at full capacity. Always over-provision beforehand, or use Virtual Node Splitting (like DynamoDB) to split partitions locally before moving them.
4. Virtual Node (vnode) Tuning
How many vnodes should each physical node own?
| vnode Count | Pros | Cons | Use Case |
|---|---|---|---|
| 32-64 | Fast rebalancing (fewer streaming sources) | Higher data skew risk | Small clusters (<10 nodes) |
| 128-256 (Recommended) | Balanced distribution, moderate overhead | Standard choice | Production (10-100 nodes) |
| 512+ | Minimal skew, fine-grained control | Higher gossip/metadata overhead | Large clusters (100+ nodes) |
[!WARNING] DynamoDB Default: 256 vnodes per node. Cassandra Default: 256 vnodes (changed from 256 to 16 in v4.0 for faster bootstrap). Setting too high (e.g., 1024) increases gossip traffic and slows membership changes.
Summary
- Range: Good for scans, bad for hotspots.
- Hash: Good for distribution, bad for scans.
- Consistent Hashing: Minimizes movement during resizing.
- Virtual Nodes: Essential for avoiding “Data Skew” (where one node holds 40% of data).