Database Sharding

Replication gives you Read Scalability. Partitioning (Sharding) gives you Write Scalability and Infinite Storage.

But it comes at a cost. “Resharding” a live 100TB database is one of the most dangerous operations in engineering.


1. Partitioning Strategies

How do we decide which node holds User:101?

A. Range Partitioning (Google BigTable, HBase, TiKV)

Assign keys by range: [A-C] -> Node 1, [D-F] -> Node 2.

  • Pros: Efficient Range Scans (SELECT * WHERE name LIKE 'A%').
  • Cons: Hot Spots. If everyone registers with ‘A’, Node 1 melts.

B. Hash Partitioning (Cassandra, DynamoDB)

Assign keys by hash: hash(key) % N.

  • Pros: Uniform distribution. No hot spots (usually).
  • Cons: Range Scans are impossible. You must query all N nodes.

C. Consistent Hashing (The Industry Standard)

Instead of mod N, we map both Nodes and Keys to a Circle (0-360°).

  • Placement: A key belongs to the first Node found clockwise on the ring.
  • Adding a Node: Only keys between the New Node and the Previous Node move.
    • Modulo Hashing: Reshuffles 100% of keys.
    • Consistent Hashing: Reshuffles $1/N$ of keys.

2. Interactive Sharding Simulator

Visualize why Virtual Nodes (VNodes) are critical. Without VNodes, if you have 3 nodes, one node might get a huge arc (50% of data) by luck. VNodes break physical nodes into many small slices to average out the variance.


3. The Resharding Storm

You have 3 nodes. You add a 4th.

  • Ideally: Node 4 takes 25% of load from Nodes 1, 2, and 3.
  • Reality: Node 4 is empty. It has to stream terabytes of data from 1, 2, and 3.
    • Network bandwidth saturates.
    • Disk I/O spikes.
    • Queries timeout.

Staff Insight: Never introduce a new shard to a live cluster at full capacity. Always over-provision beforehand, or use Virtual Node Splitting (like DynamoDB) to split partitions locally before moving them.


4. Virtual Node (vnode) Tuning

How many vnodes should each physical node own?

vnode Count Pros Cons Use Case
32-64 Fast rebalancing (fewer streaming sources) Higher data skew risk Small clusters (<10 nodes)
128-256 (Recommended) Balanced distribution, moderate overhead Standard choice Production (10-100 nodes)
512+ Minimal skew, fine-grained control Higher gossip/metadata overhead Large clusters (100+ nodes)

[!WARNING] DynamoDB Default: 256 vnodes per node. Cassandra Default: 256 vnodes (changed from 256 to 16 in v4.0 for faster bootstrap). Setting too high (e.g., 1024) increases gossip traffic and slows membership changes.


Summary

  • Range: Good for scans, bad for hotspots.
  • Hash: Good for distribution, bad for scans.
  • Consistent Hashing: Minimizes movement during resizing.
  • Virtual Nodes: Essential for avoiding “Data Skew” (where one node holds 40% of data).