The Autocomplete Engine
How does Google know you are typing “React” after just “Re…”? If you’ve ever wondered how search engines, code editors, or your phone’s keyboard suggest words as you type, the secret is a Trie (pronounced “Try”, from the word retrieval).
While Hash Maps are excellent for exact matches (like looking up a user by ID), they fall short when you need to search for prefixes. If you want to find all words starting with “cat” in a dictionary, a Hash Map would force you to scan every single entry. A Trie solves this elegantly.
A Trie is a specialized tree data structure where:
- Nodes are characters: Each node represents a single letter.
- Paths represent words: Tracing a path from the root down to a node spells out a prefix or a complete word.
- Root is empty: The top of the tree serves as the starting point and contains no character.
1. The PEDALS Framework: Designing Autocomplete
Let’s apply the PEDALS framework to understand why a Trie is the optimal choice for an Autocomplete Engine.
- Process Requirements: We need to store a dictionary of words and, given a prefix, quickly find if it exists or suggest words starting with it.
- Estimate: We might store millions of words. Fast lookup is critical (millisecond response time).
- Data Model:
- Hash Map:
O(1)for exact match, butO(N)to find all keys starting with a prefix. Too slow for autocomplete. - Binary Search Tree:
O(log N)for search, but string comparisons takeO(L). Total timeO(L log N). - Trie:
O(L)whereLis the length of the word. The lookup time depends only on the length of the prefix, not the number of words in the dictionary!
- Hash Map:
- Architecture: A tree structure where shared prefixes merge into the same path, saving space and lookup time.
- Localized Details: We use a boolean flag
isEndat each node to distinguish between a prefix (like “app”) and a complete word (like “apple”). - Scale: Tries can consume significant memory. At scale, we might compress paths (Radix Tree) or use specialized structures like DAWG (Directed Acyclic Word Graph).
Complexity Analysis
| Data Structure | Search Time | Prefix Search Time | Space Complexity |
|---|---|---|---|
| Hash Map | O(L) |
O(N * L) |
O(N * L) |
| BST | O(L log N) |
O(L log N) |
O(N * L) |
| Trie | O(L) |
O(L) |
O(N * L * 26) |
Note: L = word length, N = number of words.
The space complexity O(N * L * 26) assumes each node has an array of 26 pointers (for lowercase English letters). While it shares prefixes, the array of pointers can cause memory overhead if many nodes have few children.
2. Interactive: Trie Builder
Insert words and see how they share branches. Notice how inserting “cat” and “car” creates a shared path for “c” and “a”, then branches at “t” and “r”. Green Node = End of Word.
3. Implementation Details and Pitfalls
Common Pitfalls
- Memory Overhead: Each node storing an array of 26 pointers (or a Hash Map) can be wasteful if the tree is sparse. Many words share few prefixes in practice outside of dense dictionaries.
- Unicode Support: If you need to support emojis, accents, or non-English characters, an array of size 26 is insufficient. You must use a Hash Map
Map<Character, TrieNode>, which increases access time slightly but saves vast amounts of empty array space. - Deletion: Deleting a word from a Trie requires recursively cleaning up nodes that are no longer part of any other word. It’s often easier to just flip
isEnd = false, but this leaves “ghost” nodes.
Code Implementations
Java
class TrieNode {
// Array approach (faster but uses more memory).
// Use Map<Character, TrieNode> for Unicode support.
TrieNode[] children = new TrieNode[26];
boolean isEnd = false;
}
class Trie {
private TrieNode root;
public Trie() {
root = new TrieNode();
}
// O(L) time, O(L) space
public void insert(String word) {
TrieNode curr = root;
for (char c : word.toCharArray()) {
int idx = c - 'a';
if (curr.children[idx] == null) {
curr.children[idx] = new TrieNode();
}
curr = curr.children[idx];
}
curr.isEnd = true;
}
// O(L) time, O(1) space
public boolean search(String word) {
TrieNode curr = root;
for (char c : word.toCharArray()) {
int idx = c - 'a';
if (curr.children[idx] == null) {
return false;
}
curr = curr.children[idx];
}
return curr.isEnd;
}
// O(L) time, O(1) space
public boolean startsWith(String prefix) {
TrieNode curr = root;
for (char c : prefix.toCharArray()) {
int idx = c - 'a';
if (curr.children[idx] == null) {
return false;
}
curr = curr.children[idx];
}
return true;
}
}
Go
type TrieNode struct {
// Using a map saves space if the tree is sparse
// and easily supports Unicode (runes).
Children map[rune]*TrieNode
IsEnd bool
}
type Trie struct {
Root *TrieNode
}
func Constructor() Trie {
return Trie{Root: &TrieNode{Children: make(map[rune]*TrieNode)}}
}
func (this *Trie) Insert(word string) {
curr := this.Root
for _, char := range word {
if _, ok := curr.Children[char]; !ok {
curr.Children[char] = &TrieNode{Children: make(map[rune]*TrieNode)}
}
curr = curr.Children[char]
}
curr.IsEnd = true
}
func (this *Trie) Search(word string) bool {
curr := this.Root
for _, char := range word {
if _, ok := curr.Children[char]; !ok {
return false
}
curr = curr.Children[char]
}
return curr.IsEnd
}
func (this *Trie) StartsWith(prefix string) bool {
curr := this.Root
for _, char := range prefix {
if _, ok := curr.Children[char]; !ok {
return false
}
curr = curr.Children[char]
}
return true
}