Design Typeahead (Autocomplete)

1. What is Typeahead?

Typeahead (or Autocomplete) is the feature where a system suggests completions for a query as the user types.

  • Examples: Google Search, Amazon Product Search, VS Code IntelliSense.
  • Goal: Save user keystrokes and guide them to popular queries.

Try it yourself: Go to Google. Type ‘sys’. See ‘system of a down’, ‘system design’. Type ‘xyz’. See how suggestions change instantly. Open Network tab to see the JSON response.


2. Requirements & Goals

Functional Requirements

  • Suggest: Return top 5 terms starting with the prefix.
  • Ranking: Suggestions must be ranked by popularity (frequency) or personalization.

Non-Functional Requirements

  • Ultra Low Latency: < 100ms. The user is typing fast; results must appear instantly.
  • High Availability: Critical for UX.
  • Eventual Consistency: It’s okay if a new trending term takes 5 minutes to appear.

3. Capacity Estimation

  • DAU: 500 Million.
  • Searches: 10 searches/user/day = 5 Billion searches.
  • Keystrokes: Avg query length = 4 chars.
  • QPS: 5 Billion × 4 = 20 Billion requests/day.
  • Peak QPS: ~400,000 QPS.
  • This is a Read-Heavy system (Ratio 20:1 or higher).

RAM Estimation (Trie Size)

  • 100 Million unique terms.
  • Avg term length: 10 chars.
  • Raw Data: 1 GB.
  • Trie Overhead: Nodes, Pointers, Top-K Lists.
  • Estimate: ~15 GB to 30 GB. This fits in the RAM of a single modern server!

4. System APIs

GET /suggestions?prefix=ama&limit=5

Response:

{
  "prefix": "ama",
  "results": [
    { "term": "amazon", "score": 98 },
    { "term": "amazon prime", "score": 95 },
    { "term": "amazing spiderman", "score": 88 },
    { "term": "amazon music", "score": 85 },
    { "term": "amanda seyfried", "score": 80 }
  ]
}

5. Database Design

We cannot use a standard SQL DB (SELECT * FROM queries WHERE text LIKE 'ama%' ORDER BY freq DESC). This is O(N) and too slow for 400k QPS.

We need a specialized data structure: The Trie (Prefix Tree).

Storage Hierarchy

  1. L1 Cache (Browser): Cache results for “ama” for 1 hour.
  2. L2 Cache (In-Memory Trie): Redis or Custom Memcached storing the Trie.
  3. L3 Persistence (Cassandra / DynamoDB): Stores the raw search logs and aggregated frequencies for offline processing.

6. High-Level Design

Typeahead System Architecture.

System Architecture: Real-Time Typeahead
L1/L2 Caching | In-Memory Trie | Async Data Pipeline
Query Flow
Data Pipeline
Update Flow
Client
Real-Time Service
Async Data Pipeline
Browser
L1 Cache (Local)
API Gateway
Trie Service Cluster
Node A RAM Trie
Node B RAM Trie
Node C RAM Trie
Spell Checker
Levenshtein Dist
Kafka Logs
Aggregator
MapReduce
Freq Count
Trie Builder
Serialize
Weekly Job
Trie DB (S3)
Snapshots
"am..." No Result Async Logs Persist Load Snapshot (Deployment)
  1. Client: Sends keystrokes to API Gateway.
  2. Trie Service Cluster: In-memory Trie nodes return Top-K results.
  3. Spell Checker: Fallback if no prefix match found.
  4. Async Data Pipeline:
    • Kafka Logs: Buffers query usage.
    • Spark Aggregator: Calculates frequencies.
    • Trie Builder: Serializes Trie to Trie DB (S3).

7. Component Design (Deep Dive)

The Trie Data Structure

  • Root -> A -> M -> A.
  • Node A-M-A stores:
    1. Children: Z (for amazon), Z (for amazing).
    2. Cached Top-K: A list of the top 5 most popular queries that start with “AMA”.

Why Cache Top-K at Every Node?

If we don’t cache, we have to traverse the entire subtree of “AMA” to find the most popular ones. This is too slow.

  • Write Time: Expensive (Updating frequency means bubbling up changes).
  • Read Time: O(1). Just read the list at the node.

Trie Serialization

How do we save the Trie to disk or send it over the network?

  1. DFS Traversal: Save as Node(Char, List<Children>, TopK).
  2. Protocol Buffers: Very compact binary format.
  3. Optimization: Since the Trie fits in RAM, we can serialize the whole object graph to a file and load it on startup.

Trie Optimization: Radix Tree

A standard Trie can be memory inefficient if we have long branches with no branching (e.g., b->e->s->t).

  • Radix Tree (Compressed Trie): We merge nodes with single children.
  • Instead of b -> e -> s -> t, we store best in a single node.
  • This significantly reduces the number of nodes and pointer overhead.

What if the user types “amzon” (missing ‘a’)?

  • We can use Levenshtein Distance (Edit Distance) to find prefixes that are “1 edit away” from the user’s input.
  • However, this is expensive (O(LM)). In practice, we usually rely on a separate Spell Check Service to correct the query *before sending it to the Typeahead service.

Data Collection Pipeline

Visualized in the Async Data Pipeline zone of the diagram. How do we update the Trie?

  1. Log Service: Buffers search logs to Kafka.
  2. Aggregator (Spark): Every hour, calculate frequencies of all terms.
  3. Trie Builder: Rebuilds the Trie from the aggregated data.
  4. Deployment: Push new Trie to Trie Servers (Load Snapshot).

8. Requirements Traceability

Requirement Design Decision Justification
Ultra Low Latency In-Memory Trie RAM access (100ns) vs Disk (10ms). Ensures < 20ms processing time.
Availability Replication Replicate the full Trie on 100+ servers. No sharding complexity needed.
Scalability L1 Browser Cache Offloads 30-50% of requests (backspaces/repeats) to the client.
Freshness Async Pipeline Decouples write path (Logs) from read path (Trie). Eventual consistency is fine.

9. Observability & Metrics

We optimize for p99 Latency.

Key Metrics

  • End-to-End Latency: Time from keypress to UI render. Target < 100ms.
  • Server Processing Time: Time to traverse Trie. Target < 5ms.
  • Cache Hit Ratio: L1 (Browser) and L2 (Trie).
  • Trie Memory Usage: Alert if RAM > 80% (Need more servers or compression).

Logs

  • Sample 1% of queries for analysis. Do not log PII (Personal Identifiable Information) unless sanitized.

10. Deployment Strategy

Snapshot Rollout (Atomic Switch)

We rebuild the Trie offline every week/day.

  1. Build: Trie Builder creates trie_v2.bin and uploads to S3.
  2. Download: All Typeahead servers download trie_v2.bin to a temp folder.
  3. Hot Swap: The server loads v2 into RAM (double memory spike). Once loaded, it flips a pointer current_trie = trie_v2 and frees trie_v1.
  4. No Downtime: Requests are never dropped.

11. Interview Gauntlet

Rapid Fire Questions

  1. Why not use a database like Redis? Redis Keys are O(1), but we need Prefix matching. Redis SCAN is too slow. We need a custom Trie in process memory to avoid network RTT for every keystroke.
  2. How do you support ‘Personalization’? We can’t store a Trie per user. We store a small “User History List” in a separate DB (or cookie). We merge User History + Global Top-K at runtime.
  3. What if the Trie grows too big for RAM? 1. Radix Tree (Compress chains). 2. Sharding (A-M, N-Z) - complex but necessary for billions of terms. 3. Quantization (Store top-k scores as 1 byte instead of 4).
  4. How do you handle ‘profanity’? A Blocklist filter during the Trie Builder phase. We drop terms that match the blocklist so they never enter the production Trie.

12. Interactive Decision Visualizer: The Trie & Top-K

Visualize how the Trie stores paths. As you type, the system traverses the nodes. Notice how each node already “knows” its most popular completions (Cached Top-K), enabling O(1) retrieval.

ROOT
B
A
E
A
...
E
S
T
...
...
T
...

O(1) Cached Top-K Results

Start typing to query the Trie.

13. Summary (Whiteboard)

Scale & Latency

  • < 100ms End-to-End Latency
  • In-Memory Trie (RAM) is mandatory
  • Replication (Not Sharding) fits in RAM

Data Structures

  • Trie: Prefix Tree O(L)
  • Radix Tree: Compressed Trie
  • Top-K Cache: Store results at each node

Architecture

  • Write Path: Async (Logs -> Spark -> Trie)
  • Read Path: Sync (Browser -> Trie)
  • Deployment: Hot Swap (Double Buffering)

Challenges

  • Memory: 32GB RAM limit? (Use Sharding)
  • Updates: Real-time trends (Use separate 'Trending' Trie)
  • Typo: Spell check fallback