Deep Dive: Consistent Hashing & The Ring

If you’re building a distributed cache or database, the most critical decision you make is: “Which node holds Key K?”

The naive approach node = hash(key) % N works fine… until N changes. When you add or remove a server, almost 100% of keys get remapped. In a caching layer, this is a Cache Stampede; your database goes down immediately.

Enter Consistent Hashing: a strategy where adding/removing a slot only affects 1/N keys.

1. The Ring Topology (Visualized)

Imagine mapping both your Servers and your Keys onto a circle (range $0 \dots 2^{32}-1$). To find where a key lives, you simply move clockwise until you hit a server.

Nodes: 3 | Keys: 20 | Churn Analysis: --

2. Staff-Level: Virtual Nodes (Vnodes)

A simple ring has a fatal flaw: Uneven Distribution. If you have 3 nodes, and by random chance they land close to each other, one node might own 10% of the ring while another owns 80%. This creates a Hot Spot.

The Solution: Virtual Nodes Instead of mapping a physical node to one point on the ring, we map it to hundreds of points (vnodes).

Node A becomes A_1, A_2, ... A_100.
Benefits:
1. Uniformity: Statistically guarantees even load distribution.
2. Heterogeneity: Give a powerful server 200 vnodes and a weak server 50.

graph LR subgraph "KeySpace" ["Linear Key Space"] n1[Node A v1] --- n2[Node B v1] --- n3[Node A v2] --- n4[Node C v1] --- n5[Node B v2] end Stats[Result: Load perfectly mixed]

3. Beyond the Ring: Maglev vs. Rendezvous

While the “Ring” is the classic visualization, Staff engineers often choose more specialized hashing algorithms depending on their constraints.

Maglev Hashing (Google)

Used in Google’s “Maglev” load balancer. Instead of a ring, it uses a Permutation Table.

Pros: Extremely fast lookups (O(1)). Excellent load distribution.
Cons: Higher memory usage to store the permutation table.
Use Case: High-performance L4 load balancers handling millions of connections.

Rendezvous Hashing (HRW)

Also known as Highest Random Weight hashing. For every key, you calculate a “weight” for every server: $score = hash(key + server_id)$. The server with the highest score wins.

Pros: Zero memory needed for a ring/table. Perfect stability (adding a node doesn’t move keys between existing nodes at all).
Cons: Lookup is O(N) where N is the number of servers (though optimized versions exist).
Use Case: Distributed caches where memory is at a premium, or when the number of servers is small (< 100).

Feature	Ring Hashing	Maglev	Rendezvous
Lookup Time	O(log V)	O(1)	O(N)
Memory	O(V)	O(M)	O(1)
Rebalance	Minimal	Minimal	Perfect

(V = vnodes, N = servers, M = table size)

4. Implementation Logic

Implementing this efficiently requires a data structure that supports fast floor/ceiling lookups (searching for the “next highest” hash).

Java: `TreeMap`

Java’s TreeMap is a Red-Black tree. We can use tailMap(key) to find the portion of the ring greater than or equal to our key.

import java.util.TreeMap;
import java.util.SortedMap;

public class ConsistentHashing {
    // Ring stores: Hash -> ServerName
    private final TreeMap<Integer, String> ring = new TreeMap<>();
    private final int numberOfReplicas; // Vnodes per server

    public ConsistentHashing(int numberOfReplicas) {
        this.numberOfReplicas = numberOfReplicas;
    }

    public void addServer(String server) {
        for (int i = 0; i < numberOfReplicas; i++) {
            // Create vnode: "ServerA-1", "ServerA-2"...
            int hash = (server + "-" + i).hashCode();
            ring.put(hash, server);
        }
    }

    public String getServer(String key) {
        if (ring.isEmpty()) return null;
        int hash = key.hashCode();
        
        // Find the first node >= hash (Clockwise search)
        if (!ring.containsKey(hash)) {
            SortedMap<Integer, String> tailMap = ring.tailMap(hash);
            hash = tailMap.isEmpty() ? ring.firstKey() : tailMap.firstKey();
        }
        return ring.get(hash);
    }
}

Go: `sort.Search`

Go doesn’t have a built-in BST like TreeMap. Instead, we use a Sorted Slice and binary search.

package main

import (
	"hash/crc32"
	"sort"
	"strconv"
)

type HashRing struct {
	ring       []int            // Sorted Keys
	nodes      map[int]string   // Map Hash -> Node Name
	replicaCnt int
}

func New(replicaCnt int) *HashRing {
	return &HashRing{
		nodes:      make(map[int]string),
		replicaCnt: replicaCnt,
	}
}

func (h *HashRing) AddNode(node string) {
	for i := 0; i < h.replicaCnt; i++ {
		// Unique key for vnode
		key := node + strconv.Itoa(i) 
		hash := int(crc32.ChecksumIEEE([]byte(key)))
		
		h.ring = append(h.ring, hash)
		h.nodes[hash] = node
	}
	sort.Ints(h.ring) // Keep ring sorted for O(log N) search
}

func (h *HashRing) Get(key string) string {
	if len(h.ring) == 0 {
		return ""
	}
	hash := int(crc32.ChecksumIEEE([]byte(key)))

	// Binary Search for first index where ring[i] >= hash
	idx := sort.Search(len(h.ring), func(i int) bool {
		return h.ring[i] >= hash
	})

	// Wrap around if we hit the end
	if idx == len(h.ring) {
		idx = 0
	}

	return h.nodes[h.ring[idx]]
}

5. Replication Strategies

In systems like Cassandra/Dynamo, we don’t just want ONE node. We want N replicas.

Strategy: Walk the ring. Once you find the “Coordinator Node” (the primary owner), continue walking clockwise and pick the next $N-1$ distinct physical nodes.

[!TIP] Rack Awareness: Simply picking the next N nodes might put all replicas in the same Rack (if vnodes land close). Production systems skip nodes that belong to the same failure domain.

Next up: Now that we can distribute data, how do we keep it fast? Module 2: Caching & Delivery.