If Arrays are the “bread” of programming, HashMaps are the “butter”. They are the default choice for storing data when you need fast lookups.
The Two Flavors
- HashMap (Dictionary in Python): Stores
Key → Valuepairs.- Keys must be unique.
- Values can be duplicated.
- Think: A phone book (Name → Number).
- HashSet (Set in Python): Stores
Keyonly.- Keys must be unique.
- Think: A guest list (Names only, no duplicates allowed).
Interactive Playground
Experiment with how a Map handles data. Notice what happens when you add a duplicate key!
HashMap Simulator
Implementation Details
Under the hood, both Python’s dict and Java’s HashMap use a Hash Table.
Python Example
Python dictionaries are highly optimized hash tables.
# 1. Create
grades = {}
# 2. Add (Put) - O(1)
grades["Alice"] = 95
grades["Bob"] = 82
# 3. Access (Get) - O(1)
print(grades["Alice"]) # 95
# 4. Check Key - O(1)
if "Charlie" not in grades:
print("Charlie is missing")
# 5. Remove - O(1)
del grades["Bob"]
Set Example
Use a set when you only care about uniqueness, not the associated value.
# 1. Create
unique_ids = set()
# 2. Add - O(1)
unique_ids.add(101)
unique_ids.add(102)
unique_ids.add(101) # Duplicate ignored!
print(len(unique_ids)) # Output: 2
# 3. Check Membership - O(1)
if 101 in unique_ids:
print("ID 101 exists")
Amortized Complexity
Why do we say Amortized O(1)? Occasionally, the hash table fills up. When the Load Factor (items / capacity) exceeds a threshold (usually 0.75), the table must resize (usually double in size).
- Resizing: Takes O(N) time because every item must be re-hashed to a new index.
- Amortized: Since resizing happens rarely (exponentially less often), the average cost per operation remains O(1).
[!IMPORTANT] Always prefer
HashMap/HashSetoverListfor lookups. Searching a list is O(N). Searching a map is O(1). This is the single most common optimization pattern in algorithmic interviews.
Deep Dive Strategy Lab: Hashmap Hashset
Intuition Through Analogy
Think of this chapter like library card catalog lookup. The goal is not to memorize a fixed trick, but to repeatedly answer:
- What is the bottleneck?
- Which constraint dominates (time, memory, latency, correctness)?
- Which representation makes the bottleneck easier to eliminate?