Write Strategies: Consistency vs Latency
[!TIP] Interview Insight: When designing a system, always ask: “Is this a read-heavy or write-heavy system?” Your choice of write strategy depends entirely on this answer.
1. The Write Problem
Reading from a cache is simple: check cache, if miss, check DB. Writing is harder. You have two copies of data (Cache and DB). How do you keep them in sync?
2. The Four Strategies
A. Write-Through (The “Safe” Way)
- How it works: The application writes to the Cache AND the Database synchronously. The write is confirmed only when both succeed.
- Pros:
- Strong Consistency: Cache and DB are always identical.
- Reliability: No data loss if cache crashes.
- Cons:
- High Latency: You pay the penalty of the DB write for every single request.
- Double Write: Every write hits the DB, so it doesn’t reduce write load.
B. Write-Back / Write-Behind (The “Fast” Way)
- Terminology: These terms are often used interchangeably. Write-Back usually refers to the CPU/Hardware cache policy, while Write-Behind refers to software (Database/Redis) patterns.
- How it works: The application writes only to the Cache. The Cache returns “Success” immediately. The Cache asynchronously syncs data to the DB later (e.g., every 5 seconds, or when the item is evicted).
- Pros:
- Low Latency: Write speed = RAM speed (~100ns).
- Write Coalescing: If you update a counter 100 times in 1 second, the DB only sees ONE write (the final value). This massively reduces DB pressure.
- Cons:
- Data Loss Risk: If the Cache crashes before syncing, that data is gone forever.
- Complexity: Implementing this correctly is hard (requires a queue or WAL).
C. Write-Around (The “Big Data” Way)
- How it works: Write directly to the DB, bypassing the cache. The cache is only populated when the data is read (on a Miss).
- Use Case: Writing massive data (e.g., Log files, Video uploads) that won’t be read immediately. Prevents the cache from being flooded with useless data (Cache Pollution).
- Trade-off: Read latency for recently written data is high (Cache Miss).
D. Refresh-Ahead
- How it works: If a cached item is accessed and is close to expiring (e.g., within 10 seconds of TTL), the cache automatically refreshes the data from the DB in the background.
- Benefit: The next user never sees a cache miss or latency spike. Excellent for preventing Thundering Herd.
Visual Data Flow
Write-Through
Write-Back
Write-Around
3. Decision Tree: Which one to choose?
Is Data Loss Acceptable?
Is the system Write-Heavy?
Write-Through
Write-Back
Write-Around
4. Interactive Demo: The Power of Write Coalescing
This demo visualizes the massive advantage of Write-Back: Coalescing.
- Select Write-Back.
- Rapidly click WRITE (+1). Notice the “Pending Updates” count increases in RAM.
- Observe that the Database is NOT touched for every click.
- Wait for the Async Flush to see one single big write to the DB.
- Try Write-Through to see the pain of slow DB writes.
5. Cache Warming: The Cold Start Problem
When you deploy a new cache server (or restart an existing one), the cache is empty. Every request is a MISS, and your database gets hammered. This is called a Cold Start.
Strategies
A. Lazy Loading (Default - Reactive)
- How: Cache is empty. Wait for users to request data. Populate cache on MISS.
- ✅ Simple: No special code needed
- ❌ Slow Start: First users experience high latency
- ❌ DB Spike: Database gets hit hard during cold start
B. Eager Loading (Proactive)
- How: Preload the cache with predictable data before taking traffic
- ✅ Fast Start: Users get instant cache hits
- ❌ Complexity: Requires knowing what to preload
- ❌ Wasted Memory: May load data that never gets accessed
Implementation Example
Bulk Warming Script (Python + Redis):
import redis
import json
from concurrent.futures import ThreadPoolExecutor
cache = redis.Redis()
db = get_database_connection() # Your DB
def warm_user_profiles():
"""Load top 10K active users into cache"""
users = db.execute("""
SELECT id, name, avatar
FROM users
ORDER BY last_active DESC
LIMIT 10000
""")
# Batch insert with pipeline (reduces RTT)
pipe = cache.pipeline()
for user in users:
key = f"user:{user['id']}"
pipe.setex(key, 3600, json.dumps(user)) # 1hr TTL
pipe.execute()
print("✅ Warmed 10K user profiles")
def warm_product_catalog():
"""Load entire product catalog (static data)"""
products = db.execute("SELECT * FROM products WHERE active = true")
pipe = cache.pipeline()
for product in products:
key = f"product:{product['id']}"
pipe.set(key, json.dumps(product)) # No TTL (static)
pipe.execute()
print("✅ Warmed product catalog")
# Run in parallel
with ThreadPoolExecutor(max_workers=5) as executor:
executor.submit(warm_user_profiles)
executor.submit(warm_product_catalog)
Production Deployment Pattern
Blue-Green Deployment with Cache Warming:
1. Deploy new server (Green)
2. Run warming script on Green (while still OFF traffic)
3. Wait for cache to populate (monitor hit ratio)
4. Flip traffic from Blue → Green
5. Keep Blue alive for 5min (rollback safety)
6. Decommission Blue
Validation:
# Check cache hit ratio before going live
redis-cli INFO stats | grep keyspace_hits
# Target: >80% hit ratio before production traffic
What to Warm?
| Data Type | Strategy | Example |
|---|---|---|
| User Sessions | Don’t warm (short-lived) | Login tokens expire quickly |
| Product Catalog | Warm fully | Static data, predictable access |
| Top Users | Warm top 1% | Celebrities, power users (80/20 rule) |
| Homepage Content | Warm fully | Everyone sees this |
| Old Articles | Don’t warm | Low traffic, unpredictable |
Interview Insight: Facebook warms their edge caches with “trending posts” before routing traffic. This prevents cold-start slowness during global events.
6. Summary
| Strategy | Latency | Data Safety | Best For… |
|---|---|---|---|
| Write-Through | High (Slow) | High | Financial Data, User Profiles (Consistency Critical) |
| Write-Back | Low (Fast) | Low (Risk) | Likes, Views, Analytics, Heavy Write Loads |
| Write-Around | High (Slow) | High | Archival Data, Large Media Uploads |