Design an API Rate Limiter
1. What is a Rate Limiter?
A Rate Limiter restricts the number of requests a user can make to a service within a given time window. It is the first line of defense against DOS attacks, Resource Starvation, and Cost Overruns.
Real-World Examples:
- Twitter API: “You have exceeded your limit of 300 tweets per 3 hours.”
- Login Forms: “Too many failed attempts. Try again in 15 minutes.”
[!TIP] Throttling vs Rate Limiting:
- Rate Limiting: “You can make 10 requests per minute.” (Hard Limit, returns 429).
- Throttling: “You are making too many requests, I will slow you down.” (Soft Limit, queues requests). In interviews, these terms are often used interchangeably, but usually refer to the “Hard Limit” scenario.
2. Requirements & Goals
Functional Requirements
- Limit: Accurately limit requests (e.g., 10 req/sec) based on rules.
- Feedback: Return clear HTTP headers (
X-Ratelimit-Remaining,Retry-After) and error code (HTTP 429 Too Many Requests). - Granularity: Limit by User ID, IP Address, or API Key.
Non-Functional Requirements
- Low Latency: The check must be super fast (< 2ms). It is in the hot path of every request.
- Distributed: Works across multiple servers.
- Fail-Open: If the rate limiter crashes, it should allow traffic (don’t break the site) rather than blocking legitimate users.
3. Capacity Estimation
If we have 1 Million Active Users and each makes 100 requests/day:
- Total Requests: 100M/day $\approx$ 1,150 QPS.
- Peak QPS might be 10k+.
- Storage: We need to store counters in memory.
- UserID (8 bytes) + Count (4 bytes) + Timestamp (4 bytes) $\approx$ 20 bytes.
- For 1M users, memory usage is tiny (~20MB). We can easily fit this in Redis.
4. System Interface
The Rate Limiter usually sits as Middleware (e.g., inside the API Gateway shown in the diagram) or as a Sidecar.
Response Headers:
HTTP 429 Too Many RequestsX-Ratelimit-Limit: 100X-Ratelimit-Remaining: 5X-Ratelimit-Reset: 1640995200(Unix timestamp)
5. Database Design: Redis Counters
We don’t use a SQL database for rate limiting because disk I/O is too slow. We use an In-Memory Store (Redis).
Data Model (Hash)
- Key:
rate_limit:{user_id} - Fields:
tokens: Integer (Current tokens left)last_refill: Timestamp (Last time we added tokens)
Alternative: Sorted Sets (For Sliding Window Log)
- Key:
rate_limit:{user_id} - Score: Timestamp
- Member: Request ID
- Note: This approach is memory-heavy and typically avoided for high-scale systems.
6. High-Level Design (Distributed)
The middleware (API Gateway/Sidecar) intercepts requests before they hit the backend.
• Lua Scripts (Atomic)
(Only reached if allowed)
7. Component Design: Algorithms & Lua
This is the core interview question. Which algorithm do you choose?
Algorithms
A. Token Bucket (Amazon/Stripe)
- Concept: A bucket holds
Ntokens. Tokens refill at rateR. Each request consumes 1 token. - Pros: Allows Bursts (if bucket is full, you can send
Nrequests instantly). Memory efficient. - Cons: Complex to tune two parameters (Bucket Size + Refill Rate).
B. Leaky Bucket (Shopify/Nginx)
- Concept: Requests enter a queue (bucket). They are processed (leak) at a constant rate. If queue full, discard.
- Pros: Smooths traffic. Stable outflow prevents overloading downstream services.
- Cons: Bursts fill the queue and cause valid requests to drop.
C. Fixed Window Counter
- Concept: Count requests in
12:00-12:01. Reset at12:01. - Pros: Simple
INCRoperation. - Cons: Edge Case (Spike). If I send 100 requests at
12:00:59and 100 at12:01:01, I sent 200 in 2 seconds, violating the limit.
D. Sliding Window Counter (The Winner)
- Concept: Hybrid approach. Weighted average of the previous window count and current window count.
- Formula:
Requests = (Requests in Current Window) + (Requests in Previous Window * Overlap Percentage) - Pros: Accurate enough, Memory efficient.
The Race Condition Problem (Distributed)
If two requests come at the exact same time (Concurrency), a classic Read-Modify-Write race occurs:
- Request A reads
tokens = 1. - Request B reads
tokens = 1. - Request A decrements and writes
0. - Request B decrements and writes
0. Result: Both requests passed, but only one token was consumed. We allowed more traffic than the limit.
Solution: Redis Lua Scripts (Atomicity) We use Lua scripts in Redis. A Lua script executes as a single Atomic Transaction; no other command can run in between its steps.
Rate Limiter Lua Script (Token Bucket): (This script runs inside the Redis Cluster node)
-- KEYS[1]: User Token Key
-- ARGV[1]: Refill Rate
-- ARGV[2]: Capacity
-- ARGV[3]: Current Timestamp (Redis Time)
local key = KEYS[1]
local rate = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Get current tokens and last refill time
local data = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now
-- Refill Logic
local delta = math.max(0, now - last_refill)
local filled_tokens = math.min(capacity, tokens + (delta * rate))
if filled_tokens >= 1 then
-- Consumption allowed
redis.call("HMSET", key, "tokens", filled_tokens - 1, "last_refill", now)
return 1 -- Allowed
else
return 0 -- Rejected (429)
end
8. Data Partitioning
When we scale to millions of users, a single Redis instance cannot hold all keys.
Redis Cluster (Sharding)
- Strategy: Shard by User ID (or API Key).
- Slot Calculation:
CRC16(user_id) % 16384. - Result: Each user’s counter lives on a specific Redis node. Operations for a single user are atomic on that node.
Multi-Region Partitioning
If you have users in US and EU:
- Option A: Centralized Redis Cluster in US. (High Latency for EU users).
- Option B: Local Redis in each region. (Limits are not shared globally. A user can hit 100 reqs in US + 100 reqs in EU).
- Option C: CRDTs (Conflict-free Replicated Data Types) or Geo-Replicated Redis (Enterprise).
- Recommendation: For most rate limiters, Option B (Local) is acceptable. It is better to allow slight over-usage than to add 100ms latency to every request.
9. Reliability & Fault Tolerance
Fail-Open Strategy
What if Redis crashes?
- Fail-Closed: Reject all requests. (Bad UX - Site goes down).
- Fail-Open: Allow all requests. (Risky - Backend might get overloaded).
- Decision: Fail-Open. It is better to let some spam through than to block legitimate users during an outage.
Clock Skew (NTP)
In the Lua script above, we pass ARGV[3]: Current Timestamp.
- Issue: If Application Server A is 5 seconds behind Server B, users might get extra tokens or be blocked unfairly.
- Solution: Use Redis Time (
TIMEcommand) inside the Lua script instead of passing it from the App Server. This ensures all servers rely on the “Redis Clock” as the single source of truth.
10. Interactive Decision Visualizer: Token vs Leaky Bucket
Click the button rapidly to simulate a traffic burst. Observe how the two algorithms behave differently.
Algorithm Arena
Token Bucket
BurstyLeaky Bucket
Smooth11. System Walkthrough (Dry Run)
Scenario: User exceeds the limit
- Request: User sends request #11 in the same second (Limit is 10/s).
- Gateway: Intercepts request, calls Rate Limiter Middleware.
- Redis Check: Middleware executes Lua script on Redis.
- Reads
tokens(0). - Reads
last_refill. - Calculates refill (0 tokens added).
- Reads
- Result: Script returns
0(Rejected). - Response: Gateway returns HTTP 429 Too Many Requests.
- Header
Retry-After: 1(Wait 1 second).
- Header
12. Interview Gauntlet
- Why use Redis? Why not a local counter in the App Server?
- Ans: If you have 10 App Servers, a local counter only limits per server. A user could hit 10 different servers and get 10x the limit. We need a shared, distributed counter.
- How do you handle race conditions?
- Ans: Redis Lua scripts. They guarantee atomicity for the Read-Refill-Decrement logic.
- What if Redis goes down?
- Ans: Fail-Open. We bypass the limiter to ensure availability, even if it means risking some overload.
- How to handle Clock Skew?
- Ans: Do not trust App Server time. Use
redis.call('TIME')in the Lua script to use the Redis server’s clock as the source of truth.
- Ans: Do not trust App Server time. Use
- How do you test a rate limiter?
- Ans: Integration tests with a mock Redis. Simulate bursts using parallel threads and verify that exactly N requests pass.
13. Whiteboard Summary
1. Core Requirements
- Limit: 1000/s
- Low Latency: < 2ms
- Distributed & Accurate
2. Key Architecture
- Middleware: In Gateway/Sidecar
- Store: Redis (In-Memory)
- Logic: Lua Script (Atomic)
3. Data Models
- Hash: `tokens`, `last_refill`
- Key: `lim:{user_id}`
- Sharding: By User ID
4. Bottlenecks & Fixes
- Race Condition: Lua Scripts.
- Latency: Local Redis in each region.
- Failure: Fail-Open strategy.