The Problem: Expensive lookups

Imagine you have 1 Billion users. When a new user tries to sign up with a username, you need to check if it’s already taken.

  • Checking the DB for every keystroke is too slow.
  • Storing 1 Billion strings in a Hash Set in memory takes too much RAM.

Bloom Filters solve this by using a small bit array and some hash functions.


Interactive Bloom Filter

Add items to the filter and then test if they “exist”.

Try adding "apple" then checking it!

How It Works

  1. Bit Array: Start with an array of $m$ bits, all set to 0.
  2. Hashing: When adding an element, run it through $k$ different hash functions.
  3. Flipping: Take the output of those hashes and set those positions in the bit array to 1.
  4. Checking: To see if an item exists, hash it again. If all those bits are 1, the item might exist. If any bit is 0, it definitely does not.

The “Rules” of Bloom Filters:

  • False Positives are possible: Two different strings might happen to hash to the same bits.
  • False Negatives are impossible: If the bit is 0, the item was definitely never added.
  • Deletions are impossible: You can’t turn a bit back to 0 without potentially deleting other items.

Real World Usage

  • Medium: Prevents recommending an article you’ve already read.
  • Google Chrome: Checks if a URL is in a malicious site list before you visit.
  • Cassandra / RocksDB: Checks if a key exists in an SSTable file before doing an expensive disk read.