Eviction Policies: LRU vs LFU

In 2019, a major video streaming platform upgraded their recommendation engine to do full-table PostgreSQL scans for A/B test reports. The job ran nightly and pulled every row in the events table — 150 million rows. The next morning, their Redis hit rate had dropped from 94% to 31%. Cause: the Postgres buffer pool used LRU. The scan loaded 150M cold rows into cache, evicting every hot user-session row. This is the Scan Attack problem — one-time-use data polluting your most valuable cache space. The same bug existed at GitHub, until they switched their metadata cache from LRU to an ARC-based policy. Choice of eviction algorithm directly determines your cache hit ratio under adversarial workloads.

[!IMPORTANT] In this lesson, you will master:

  1. O(1) Mechanics: Why the HashMap + Doubly Linked List combo is the “Swiss Army Knife” of caching.
  2. Scan Resistance: Understanding why LRU fails during a database backup and how TinyLFU survives.
  3. Hardware Intuition: Calculating the physical RAM tax of pointers vs. bits.

[!TIP] Interview Insight: If asked to “Design an LRU Cache”, do not just explain it. Write the code (or pseudo-code) for the get and put operations using a HashMap and Doubly Linked List to achieve O(1) time complexity.

1. Why O(1) Matters?

When your cache has 10 items, O(N) (looping through a list) is fast (10 operations). When your cache has 10 million items, O(N) takes 10 million operations. That’s a 100ms latency spike for every single request.

We need O(1) (Constant Time). No matter if you have 10 or 10 billion items, the operation takes the same amount of time.

[!TIP] Try it yourself: Hover over the chart to see the latency difference as the number of items (N) grows.

Complexity: O(N) vs O(1)

Time (ms)
Items (N)
O(N)
O(1)
Hover over the chart...

2. LRU (Least Recently Used)

The most common policy. It assumes that if you used it recently, you will use it again soon (Temporal Locality).

  • On Evict: Remove the node from the Tail (Least Recently Used). Remove from Map.

[!NOTE] Hardware-First Intuition: The “Metadata Tax”. A standard LRU implementation on a 64-bit system is memory-hungry. Each entry in the Doubly Linked List costs you 16 bytes just for pointers (Head + Tail). Add in the HashMap overhead, and you could be spending 50-80 bytes of RAM per cache entry before you even store the actual data. This is why high-performance caches like Memcached use slabs and custom memory allocators to reduce fragmentation.


3. Interactive Demo: LRU Internals & Hit Ratio

Visualizing the HashMap + Linked List structure.

  • Head (Left): Most Recently Used (Safe).
  • Tail (Right): Least Recently Used (Danger Zone).
  • Scan Attack: Simulate a database scan (e.g., SELECT * FROM big_table) that floods the cache with one-time-use keys, wiping out your useful data.

[!TIP] Try it yourself: Click “ACCESS / ADD” to add items. Then click “SCAN ATTACK” to see how a flood wipes out your hot data.

HEAD (MRU) TAIL (LRU)
Waiting...
Hits: 0
Misses: 0
Ratio: 0%

4. Modern Policies: Beyond LRU

LRU is great, but it has a flaw: Cache Pollution. If I scan a DB (read every item once), LRU replaces everything with data I will never use again.

A. W-TinyLFU (Window TinyLFU) - The Solution

Used in Caffeine (Java) and Ristretto (Go). It solves the memory problem using a Count-Min Sketch and a Window Admission Policy.

1. The Count-Min Sketch (Probabilistic Frequency)

Think of it as a Bloom Filter for Counting. Instead of 1 counter per key (Expensive), we have a small 2D array of counters (Cheap).

  1. Hash: We hash the key using multiple hash functions (e.g., 3 functions).
  2. Increment: We increment the counters at those 3 positions.
  3. Read: To get the frequency, we check the 3 positions and take the Minimum value.

Why Minimum? Because of collisions. A cell might be shared by “Apple” and “Banana”. If “Apple” is 10 and “Banana” is 2, the cell says 12. But since we use multiple hashes, it’s unlikely all cells collide. The Minimum value is the closest to the truth.

This protects the cache from Scan Attacks.

Staff Engineer Tip: In multi-threaded environments, LRU has a hidden bottleneck: Lock Contention. Every “Read” is actually a “Write” to the Doubly Linked List (moving the node to the head). This requires a lock. Modern caches like Caffeine use a Ring Buffer to record hits asynchronously, allowing “Read” operations to stay truly parallel without blocking on a global LRU lock.

Staff Engineer Tip: Admission vs. Eviction. Most engineers focus only on Eviction (who to kick out). But Admission (who to let in) is often more powerful. TinyLFU’s main strength is its “Admission Filter”. When the cache is full and a new item arrives, TinyLFU compares the frequency of the new item against the victim. If the new item is a “one-hit wonder”, it is rejected immediately, saving the hot data from being evicted.

[!TIP] Try it yourself: Run the “Database Scan” and compare how LRU (Red) gets wiped out while TinyLFU (Green) protects the hot items.

Scan Resistance: LRU vs TinyLFU

LRU Cache
Vulnerable to Scans
TinyLFU Cache
Protects Hot Items
Frequent Items: [A, B, C]

B. ARC (Adaptive Replacement Cache) - The Gold Standard

ARC is one of the most intelligent caching algorithms ever invented (used by ZFS, PostgreSQL). It beats LRU by maintaining Four Lists to balance Recency vs Frequency.

It dynamically tunes the cache size between “Recent” items and “Frequent” items based on workload.

C. The Clock Algorithm (Second Chance)

Common in Operating System Page Replacement (Linux Page Cache). It approximates LRU with very low overhead.

  • Structure: Pages are arranged in a Circle (Clock).
  • Reference Bit (R): Every page has a bit (0 or 1).
  • Clock Hand: A pointer moves around the circle.

Algorithm:

  1. On Page Fault: Look at the page pointed by the Hand.
  2. If R=1: Give it a “Second Chance”. Set R=0. Move Hand to next page.
  3. If R=0: Evict this page. Replace with new page. Move Hand.

[!TIP] Try it yourself: Add a page (e.g. “A”) to give it a second chance (R=1). Add a new page (“E”) to force an eviction.

Pointer at index 0

5. Eviction Policy Decision Matrix

Choosing the right policy depends on your access pattern.

Access Pattern Best Policy Why
Temporal Locality (Recently used → used again) LRU News feed, user sessions, recent posts
Frequency-Based (Popular items stay popular) LFU / TinyLFU Top 100 songs, trending videos, product catalog
Scanning Workloads (Full table scans) ARC / Clock-Pro Database buffer pools, large reports
Mixed Workload W-TinyLFU CDN edge caches (mix of hot + one-hit-wonders)
Small Cache (< 1000 items) LRU Simple, O(1), minimal overhead
Large Cache (> 1M items) TinyLFU Memory efficient, still O(1)

[!TIP] Summary: Use LRU for general cases. Use TinyLFU or ARC if you have high-throughput and “scan” workloads (e.g., Database Cache).