
When your cache grows beyond a single Redis instance, you enter the world of Distributed Caching.
At this scale, you stop worrying about eviction policies (LRU) and start worrying about Topology and Failure Domains.
1. The Routing Layer: Client vs. Proxy
How does your application know which of your 100 Redis nodes holds key user:101?
Option A: Fat Client (Smart Client)
The application library (e.g., Jedis, EVCache Client) handles the sharding logic.
- Pros: Lowest latency (no extra hop).
- Cons:
- Coupling: App deployment needed to change topology.
- Connection Explosions: 1,000 App Nodes x 100 Cache Nodes = 100,000 open connections.
Option B: Sidecar / Mesh (The Modern Standard)
A proxy (like Facebook’s Mcrouter or Envoy) runs on the same host as the app (localhost).
- How it works: App talks to
localhost:11211. The Sidecar handles sharding, failover, and “Shadowing”. - Connection Pooling: The Sidecar multiplexes thousands of app requests over a few persistent TCP connections to the cache fleet.
[!TIP] Staff Insight: Sidecars enable Shadowing. You can spin up a new “Cold” cache cluster and configure the Sidecar to asynchronously forward 1% of traffic to it. This lets you warm up new capacity without risking user latency.
2. Sharding Logic & Hygiene
Distributed caches use Consistent Hashing to spread keys. But how you implement it matters.
The “Virtual Node” Pattern
To avoid “Hot Shards” (where one unlucky node gets all the popular keys), we assign each physical node 100-200 “Virtual Tokens” on the hash ring.
- Result: Key distribution is statistically uniform (Standard Deviation < 5%), even if physical nodes have different sizes.
Failure Mode: The Rehashing Storm
What happens when a cache node dies?
- Modulo Hashing:
hash(key) % N. If N changes from 10 to 9, ~90% of keys move. The database melts. - Consistent Hashing: Only 1/N keys move. The database feels a small bump, not a meteor strike.
3. Hot Key Mitigation: “Gutter” Pools
Even with perfect sharding, a single “Celebrity Key” (e.g., Justin Bieber’s latest tweet) can overload a single shard.
The Solution: L1/L2 Tiering & Gutter Pools
- L1 Local Cache: Keep the hottest keys in the App’s Memory (Guava/Caffeine).
- Trade-off: Eventual consistency. You need a Pub/Sub mechanism to invalidate this across the fleet.
- Gutter Pools (Facebook Pattern):
- If the primary shard for a key fails (or is overloaded), the request is routed to a specialized “Gutter Pool”.
- Keys in the Gutter expire very quickly (TTL = 10s).
- This prevents a cascading failure where load shifts to the next node and kills it too.
4. Interactive Topology Simulator
Compare Client-Side Sharding vs Sidecar Proxying under load.
4. Detecting Hot Keys in Real-Time
Before you can mitigate hot keys, you need to detect them.
Count-Min Sketch (Probabilistic)
A probabilistic data structure that tracks key frequency with minimal memory.
- How it works: Hash each key to several positions in a fixed-size array. Increment counters at those positions.
- Memory: ~1MB can track millions of keys with ~5% error rate.
- Speed: O(1) per key access.
Heavy Hitters Algorithm
Track the top-K most frequent keys:
- Stream: For each cache request, hash the key.
- Threshold: If a key appears > 1000 times/second, mark it as “hot”.
- Action: Route hot keys to a dedicated Gutter Pool or replicate them to Local L1 caches.
[!TIP] Production Pattern: Redis Enterprise uses “Active-Active Geo-Distribution” which automatically detects hot keys and replicates them across regions for local reads.
Staff Takeaway
The difference between a Junior and Staff design is Connection Management.
- Junior: “App connects to Redis.”
- Staff: “App connects to local Envoy, which manages a connection pool to the Sharded Redis fleet, handling health checks and gray failures.”