Flash Sale System (Ticketmaster)

[!NOTE] This module explores the core principles of Flash Sale System (Ticketmaster), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. What is a Flash Sale System?

In 2022, Ticketmaster crashed when millions of fans tried to buy Taylor Swift tickets simultaneously. The problem wasn’t just volume; it was Contention.

A Flash Sale is a Distributed Systems nightmare:

Extreme Contention: 1 Million users want 100 items.
Zero Error Margin: You cannot sell 101 items (Overselling).
Short Duration: The event lasts seconds. Auto-scaling groups don’t have time to boot up.

[!TIP] Real-World Examples:

Ticketmaster: Concert tickets.

Amazon Prime Day: Limited deals.

Sneaker Drops: Nike SNKRS app.

2. Requirements & Goals

2.1 Functional Requirements

Purchase: Users can buy an item.
Inventory: System must track available stock strictly.
Limits: 1 item per user.

2.2 Non-Functional Requirements

High Consistency: Absolutely NO overselling. (CP over AP).
High Availability: The system must not crash under surge.
Low Latency: Responses must be fast (< 200ms).
Fairness: First come, first served.

2.3 Extended Requirements

Bot Protection: Real humans must get priority over scalper bots.

3. Capacity Estimation

3.1 Traffic Analysis

Traffic: 1 Million Active Users at 10:00 AM.
QPS: Peak traffic 500,000 requests per second.
Inventory: 100 iPhones.

3.2 The Bottleneck

A standard PostgreSQL database handles ~1,000 TPS on a single row lock. We need 500,000 TPS.

Conclusion: We cannot touch the database for the initial “Buy” action. We need an In-Memory solution (Redis).

4. System APIs

Minimal API surface area to reduce attack vectors.

Method	Endpoint	Description
`POST`	`/v1/flash-sale/buy`	Attempts to reserve an item. Payload: `{ itemId: "iphone15", userId: "u1" }`
`GET`	`/v1/flash-sale/status`	Checks if sale is active/sold out.

5. Database Design

5.1 Redis (Hot State)

Counter: item:{id}:stock → Integer
User Set: item:{id}:users → Set<UserId> (To prevent double booking)

5.2 SQL (Cold State)

Table: orders
Columns: order_id, user_id, item_id, status.

6. High-Level Architecture

We use a Funnel Architecture to filter traffic at every layer.

System Architecture: Flash Sale Traffic Funnel

Multi-Layer Bot Filtering | Redis Atomic Inventory | Async Queueing

High-Volume Traffic

Filtered Requests

Valid Orders

Zone 1: The Surge (1M+ Users)

CDN / WAF / BOT PROTECTION

Zone 2: Rate Limiting (100k Req/s)

TOKEN BUCKET / AUTH VALIDATION

Zone 3: Atomic Lock (10k Req/s)

Redis Cache Stock: 100

                  Lua Script:

DECR stock

Zone 4: Persistence

Kafka → SQL 100 Orders Created

7. Component Design (Deep Dive)

7.1 Redis Atomic Counter (Lua)

Why Lua? Redis commands (GET, DECR) are atomic individually, but a sequence of them is not. A race condition can occur between checking the stock and decrementing it. Lua Script executes atomically on the Redis server. No other command can run while the script is executing. (See ACID Transactions for DB atomicity details).

-- KEYS[1] = inventory_key ("item:ps5:stock")
-- KEYS[2] = user_history_key ("item:ps5:users")
-- ARGV[1] = user_id

local stock = tonumber(redis.call("GET", KEYS[1]))
if stock <= 0 then
  return -1 -- Sold Out
end

if redis.call("SISMEMBER", KEYS[2], ARGV[1]) == 1 then
  return -2 -- Already Bought
end

redis.call("DECR", KEYS[1])
redis.call("SADD", KEYS[2], ARGV[1])
return 1 -- Success

7.2 Queue-Based Load Leveling

Once Redis returns 1 (Success), we don’t write to SQL immediately. We push a message to Kafka: { "user_id": "u1", "item_id": "ps5", "timestamp": 1620000000 } A separate worker process reads from Kafka and inserts into the SQL DB at a safe pace (e.g., 500 inserts/sec).

8. Data Partitioning & Sharding

For a single item Flash Sale (e.g., 100 iPhones), we cannot shard the Item Stock key easily without complex coordination. However, we partition the Traffic using the Load Balancer and Rate Limiters.

Request Sharding: Users are hashed by ID to different API Gateways.
Inventory Sharding: If we have 10,000 items, we can split them into 10 Redis partitions (1000 items each). The Load Balancer routes users to a specific partition based on hash(UserID) % 10.
Trade-off: One partition might sell out while others still have stock. We might need a “Work Stealing” algorithm to rebalance, but that adds complexity. For small inventory (100 items), a single Redis node is fine.

9. Reliability, Caching, & Load Balancing

9.1 Preventing Overselling

Pessimistic Locking (SQL): SELECT ... FOR UPDATE. Safe but slow.
Optimistic Locking (SQL): UPDATE ... WHERE version = 1. Fast but causes retry storms.
Redis Atomic: Best for high contention.

9.2 Redis Persistence

If Redis crashes, we lose the stock count.

AOF (Append Only File): We must set appendfsync always or everysec to minimize data loss.
Trade-off: always is slow (disk I/O). everysec risks losing 1 second of sales (overselling 1 sec worth of inventory). For Ticketmaster, we usually accept everysec and have a small “buffer” inventory or handle refunds.

10. Interactive Decision Visualizer

Compare Pessimistic Locking (SQL) vs. Redis Atomic Decrement.

SQL Lock: Traffic jams. High Latency.
Redis: High-speed highway. Low Latency.

Flash Sale Simulator

Architecture Strategy Selection

Inventory Left

100

Avg Latency

0ms

Items Sold

Select strategy and start.

11. System Walkthrough

Let’s trace a successful purchase request during the peak load.

Scenario: User A Buys an iPhone 15

User Action: User A clicks “Buy Now” at 10:00:01 AM.
Edge Layer (CDN/WAF): Checks IP reputation. Passes request.
API Gateway: Token Bucket Rate Limiter allows the request.
Application Server: Executes Redis Lua Script.
- Input: KEYS=["item:iphone15:stock"], ARGV=["user_A"].
- Logic: Stock is 50. User not in set. DECR to 49. SADD user.
- Output: 1 (Success).

Event Publishing: App Server pushes message to Kafka Topic orders-stream.

  { "order_id": "uuid-123", "user_id": "user_A", "item": "iphone15", "status": "PENDING_PAYMENT" }

Response: App Server returns 200 OK to User A. “Reservation Confirmed! Redirecting to Payment…”.
Async Worker: Pulls from Kafka, inserts row into SQL orders table.

12. Low-Level Optimizations

12.1 Redis Lazy Free

Deleting a large key (e.g., the Set of 1 Million users after the sale ends) is a blocking operation.

Problem: DEL item:iphone:users can block Redis for 2 seconds.
Solution: Use UNLINK (non-blocking delete) or configure lazyfree-lazy-user-del yes. This moves the memory reclamation to a background thread.

In highly concurrent in-memory systems (like if you wrote a custom C++ engine instead of Redis), False Sharing occurs when two atomic counters sit on the same CPU Cache Line (64 bytes).

Scenario: Core A updates Stock_Item_1. Core B updates Stock_Item_2.
Issue: If both variables share a cache line, Cores invalidate each other’s L1 cache, killing performance.
Solution: Cache Line Padding. Add dummy bytes between variables to ensure they sit on different cache lines.

13. Interview Gauntlet

Q1: How do you prevent bots from buying everything?

Answer: Multi-layered approach. 1. CAPTCHA at checkout. 2. WAF (IP rate limits). 3. Post-analysis (cancel orders delivering to the same address). 4. “Waiting Room” queue.

Q2: Why not use a Database with Row Locking?

Answer: Row locking serializes requests. Postgres can handle ~1k TPS on a single row. We need 100k TPS. Serializing 100k requests would cause massive timeouts.

Q3: What if the user reserves an item but doesn’t pay?

Answer: The reservation has a TTL (e.g., 5 mins). We use a Delayed Queue (or Redis Keyspace Notification). If payment is not received in 5 mins, we increment the stock back (INCR).

Q4: How do you test this system?

Answer: Distributed Load Testing (k6 / JMeter) running on multiple cloud instances to simulate 500k concurrent users. We verify “Sold Count” == “Inventory” exactly.

Q5: Can we use a Distributed Lock (Redlock)?

Answer: Redlock is too slow for high-frequency counting. It requires multiple network round trips to acquire/release. Atomic Lua scripts are much faster (single RTT).

14. Summary: The Whiteboard Strategy

1. Requirements

Func: Buy, Inventory.
Non-Func: No Overselling, High Load.

2. Architecture

[Client] → [Rate Limit] → [Redis Lua] | (Async) → [Kafka] → [SQL]

* Redis: Hot State (Stock). * SQL: Cold State (Orders).

3. Data & API

          POST /buy → { itemId, userId }

          Redis: DECR item:stock

          SQL: Orders Table

4. Deep Dives

Lua: Ensures atomicity.
Funnel: Filter traffic early.

Flash Sale System (Ticketmaster)

Flash Sale System (Ticketmaster)

1. What is a Flash Sale System?

2. Requirements & Goals

2.1 Functional Requirements

2.2 Non-Functional Requirements

2.3 Extended Requirements

3. Capacity Estimation

3.1 Traffic Analysis

3.2 The Bottleneck

4. System APIs

5. Database Design

5.1 Redis (Hot State)

5.2 SQL (Cold State)

6. High-Level Architecture

7. Component Design (Deep Dive)

7.1 Redis Atomic Counter (Lua)

7.2 Queue-Based Load Leveling

8. Data Partitioning & Sharding

9. Reliability, Caching, & Load Balancing

9.1 Preventing Overselling

9.2 Redis Persistence

10. Interactive Decision Visualizer

Flash Sale Simulator

11. System Walkthrough

Scenario: User A Buys an iPhone 15

12. Low-Level Optimizations

12.1 Redis Lazy Free

12.2 False Sharing & Padding

13. Interview Gauntlet

14. Summary: The Whiteboard Strategy

1. Requirements

2. Architecture

3. Data & API

4. Deep Dives