Resharding & Scaling

One of the main promises of Redis Cluster is the ability to add or remove nodes without downtime. This process is called Resharding.

Real-World Analogy: The Moving Truck Imagine moving to a new house. The new house (IMPORTING) is getting ready to receive boxes. The old house (MIGRATING) is packing them up. The moving truck (MIGRATE) carries boxes atomically. If someone mails a letter to the old house during the move, the old house leaves a forwarding address (ASK) saying, “Try the new house, but explicitly tell them you were forwarded (ASKING).”

1. The Resharding Process

Imagine you have 3 nodes, each holding ~5,461 slots. You want to add a 4th node. To balance the cluster, you need to move some slots from the existing nodes to the new node.

This process happens in 3 phases per slot:

  1. Set State:
    • Target Node is set to IMPORTING state for the slot.
    • Source Node is set to MIGRATING state for the slot.
  2. Move Keys:
    • A script (like redis-cli --cluster reshard) iterates over keys in that slot.
    • It sends MIGRATE commands to move keys atomically from Source to Target.
  3. Finalize:
    • Once all keys are moved, the slot ownership is officially transferred.
    • Gossip propagates the new config (Source Node no longer owns the slot).

2. Handling Traffic During Migration

This is where the ASK redirection comes in.

  • Request to Source: If the key is still there, it answers. If the key has moved, it returns -ASK TargetNode.

  • Request to Target: Usually, the target rejects keys for slots it doesn’t officially own yet. But if the client sends an ASKING command first (which follows the ASK redirect), the target accepts the request.

Under the Hood: The Commands

To execute a resharding operation, Redis uses standard commands under the hood:

  1. On Target Node: CLUSTER SETSLOT 100 IMPORTING <source-node-id>
  2. On Source Node: CLUSTER SETSLOT 100 MIGRATING <target-node-id>
  3. On Source Node: Read keys via CLUSTER GETKEYSINSLOT 100 10
  4. On Source Node: MIGRATE <target-ip> <target-port> "" 0 5000 KEYS user:1 user:2
  5. Broadcast: CLUSTER SETSLOT 100 NODE <target-node-id>

⚠️ War Story: The Big Key Outage The MIGRATE command is atomic and synchronous. If a single key in the migrating slot is massive (e.g., a Set with 10 million elements), the source node’s main thread will be completely blocked while serializing and transferring that key. This will cause latency spikes across the entire cluster and potentially trigger unintended failovers. Always monitor key sizes before a massive reshard.

3. Interactive: Slot Migration

Visualize moving Slot 100 from Node A to Node B.

Status: Idle
Node A
Owner: Slot 100
user:1
user:2
user:3
Node B
Target

4. Key Takeaways

  • Zero Downtime: The cluster remains available during migration.
  • Atomic Key Moves: Keys are locked for a brief moment while moving, ensuring consistency.
  • Smart Redirection: Clients are guided to the right node via ASK if they hit the wrong one during the transition.