If Arrays are the “bread” of programming, HashMaps are the “butter”. They are the default choice for storing data when you need fast lookups.

The Two Flavors

  1. HashMap (Dictionary in Python): Stores Key → Value pairs.
    • Keys must be unique.
    • Values can be duplicated.
    • Think: A phone book (Name → Number).
  2. HashSet (Set in Python): Stores Key only.
    • Keys must be unique.
    • Think: A guest list (Names only, no duplicates allowed).

Interactive Playground

Experiment with how a Map handles data. Notice what happens when you add a duplicate key!

HashMap Simulator

Ready.
KEY
VALUE
Map is empty

Implementation Details

Under the hood, both Python’s dict and Java’s HashMap use a Hash Table.

Python Example

Python dictionaries are highly optimized hash tables.

# 1. Create
grades = {}

# 2. Add (Put) - O(1)
grades["Alice"] = 95
grades["Bob"] = 82

# 3. Access (Get) - O(1)
print(grades["Alice"]) # 95

# 4. Check Key - O(1)
if "Charlie" not in grades:
  print("Charlie is missing")

# 5. Remove - O(1)
del grades["Bob"]

Set Example

Use a set when you only care about uniqueness, not the associated value.

# 1. Create
unique_ids = set()

# 2. Add - O(1)
unique_ids.add(101)
unique_ids.add(102)
unique_ids.add(101) # Duplicate ignored!

print(len(unique_ids)) # Output: 2

# 3. Check Membership - O(1)
if 101 in unique_ids:
  print("ID 101 exists")

Amortized Complexity

Why do we say Amortized O(1)? Occasionally, the hash table fills up. When the Load Factor (items / capacity) exceeds a threshold (usually 0.75), the table must resize (usually double in size).

  • Resizing: Takes O(N) time because every item must be re-hashed to a new index.
  • Amortized: Since resizing happens rarely (exponentially less often), the average cost per operation remains O(1).

[!IMPORTANT] Always prefer HashMap/HashSet over List for lookups. Searching a list is O(N). Searching a map is O(1). This is the single most common optimization pattern in algorithmic interviews.


Deep Dive Strategy Lab: Hashmap Hashset

Intuition Through Analogy

Think of this chapter like library card catalog lookup. The goal is not to memorize a fixed trick, but to repeatedly answer:

  1. What is the bottleneck?
  2. Which constraint dominates (time, memory, latency, correctness)?
  3. Which representation makes the bottleneck easier to eliminate?