Bitmaps and HyperLogLog
Note: This module explores the core principles of Bitmaps and HyperLogLog, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Bitmaps: The Power of Bits
A Bitmap is not a data structure; it’s a set of bit-level operations on a String. Since a String can be 512MB, you can store 2^{32} bits.
- Space: 1 million users take ~128KB of RAM.
- Speed: O(1) to set/get a bit.
- Use Case: Real-time Analytics (DAU - Daily Active Users).
Tracking Daily Active Users
Assign each user an integer ID (offset). If User 100 logs in, set bit 100 to 1.
// User 100 logged in today
jedis.setbit("visits:2023-10-01", 100, true);
// Count active users
long dau = jedis.bitcount("visits:2023-10-01");
// Check if User 100 was active
boolean active = jedis.getbit("visits:2023-10-01", 100);
// User 100 logged in
rdb.SetBit(ctx, "visits:2023-10-01", 100, 1)
// Count active users
dau, _ := rdb.BitCount(ctx, "visits:2023-10-01", nil).Result()
// Check status
active, _ := rdb.GetBit(ctx, "visits:2023-10-01", 100).Result()
2. HyperLogLog: Counting Billions with 12KB
Counting unique items (Cardinality) usually requires O(N) memory (like a Set). But what if you have 1 billion users? A Set would take gigabytes.
HyperLogLog (HLL) is a probabilistic data structure.
- Space: Fixed 12KB (maximum).
- Accuracy: Standard error of 0.81%.
- Use Case: Unique IP addresses, Unique Search Queries.
How it Works (Intuition)
It hashes data and counts the leading zeros in the binary representation. The more leading zeros, the more unique items you likely have (statistically).
3. Interactive: Probabilistic Counter
Simulate adding unique items to a HyperLogLog vs a Perfect Set. Watch the error rate.
Simulation
Adding random unique IDs...
4. Code Example: Unique Visitors
// Log unique IP address
jedis.pfadd("unique_visitors", "192.168.1.1");
jedis.pfadd("unique_visitors", "192.168.1.2");
// Get estimated count
long count = jedis.pfcount("unique_visitors");
// Returns 2
// Log unique IP
rdb.PFAdd(ctx, "unique_visitors", "192.168.1.1")
rdb.PFAdd(ctx, "unique_visitors", "192.168.1.2")
// Get count
count, _ := rdb.PFCount(ctx, "unique_visitors").Result()
// Returns 2