Design Slack/Discord (Real-Time Messaging)
[!NOTE] This module explores the core principles of Design Slack/Discord (Real-Time Messaging), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. What is a Real-Time Messaging System?
Building a chat app for 10 users is easy: INSERT into a database and SELECT * every second.
Building Slack (Enterprise) or Discord (Gaming Communities) for 10 Million concurrent users is a Distributed Systems masterpiece.
The challenge is not just storing messages; it’s Synchronization.
- Real-Time: When I type “Hello”, 50,000 people in the
#generalchannel must see it in < 50ms. - Presence: Knowing exactly who is “Online”, “Idle”, or “Typing…” among millions of users.
- Statefulness: Unlike a REST API, the server must maintain a persistent TCP connection (WebSocket) with the client.
[!TIP] Real-World Examples:
- Slack: Workplace communication (High reliability, structured channels).
- Discord: Voice/Text for communities (Massive scale, ephemeral voice channels).
- WhatsApp: Mobile-first, End-to-End Encryption (different architecture, usually Long Polling/Push).
2. Requirements & Goals
2.1 Functional Requirements
- 1-on-1 & Group Chat: Send/Receive messages instantly.
- Channels: Support for large channels (e.g., Discord servers with 500k members).
- Presence: Show Online/Offline status in real-time.
- History: Infinite scroll of message history.
- Multi-Device: Sync state between Phone and Laptop.
2.2 Non-Functional Requirements
- Low Latency: Message delivery < 50ms (within the same region).
- High Availability: 99.99%. Chat is often business-critical.
- Scalability: Handle 10 Million concurrent connections.
- Consistency: Messages must appear in the correct order (Total Ordering within a channel).
3. Capacity Estimation
3.1 Traffic Analysis
- DAU: 20 Million.
- Concurrent Users: 10 Million (Peak).
- Messages: 50 msg/user/day → 1 Billion msg/day.
- Write QPS: 109 / 86400 ≈ 11,500 msg/sec.
- Peak QPS: 5x Average → ~60,000 msg/sec.
3.2 Bandwidth & Storage
- Avg Message Size: 100 Bytes.
- Ingress Bandwidth: 60k × 100 Bytes = 6 MB/s (Trivial).
- Egress Bandwidth (Fanout):
- If a user posts to a channel with 10k online users: 100 Bytes × 10,000 = 1 MB for a single message.
- This Fanout is the bottleneck.
- Storage: 1 Billion msg/day × 100 Bytes = 100 GB/day.
- 5 Years: 100 GB × 365 × 5 ≈ 180 TB.
- Conclusion: We need a sharded NoSQL store (Cassandra/ScyllaDB) for history.
4. System APIs
We use a hybrid approach: REST for actions (Login, Join Channel, Upload File) and WebSockets for real-time events.
| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/login |
Authenticates user, returns auth_token and gateway_url. |
POST |
/v1/channels/{id}/messages |
Sends a message. Payload: { content: "Hello" } |
GET |
/v1/channels/{id}/history |
Fetches old messages. Params: before_id=... |
WS |
/gateway |
WebSocket Handshake. Params: token=... |
5. Database Design
5.1 Cassandra (Message History)
We need massive write throughput and range queries (get messages by time).
- Partition Key:
channel_id(Groups all messages for a channel together). - Clustering Key:
message_id(Snowflake ID, time-sorted).
CREATE TABLE channel_messages (
channel_id BIGINT,
message_id BIGINT,
user_id BIGINT,
content TEXT,
created_at TIMESTAMP,
PRIMARY KEY (channel_id, message_id DESC)
);
5.2 Redis (State & Presence)
- User Session:
user:{id}:gateway→10.0.0.5(Which server holds the TCP connection?) - Presence:
user:{id}:status→online(TTL 30s, refreshed by heartbeat).
6. High-Level Architecture
We move from “Request-Response” to a Stateful Gateway Architecture.
7. Component Design (Deep Dive)
7.1 Gateway Aggregation
A user might belong to 100 channels. If we subscribe the User’s Gateway connection to 100 Redis channels, Redis will be overwhelmed by the number of subscriptions.
- Naive Approach: 10M Users × 100 Channels = 1 Billion Redis Subscriptions. Too slow.
- Optimized Approach: The Gateway subscribes to Redis channels, not the user.
- If
User A(on GW-1) andUser B(on GW-1) are both in#general, GW-1 subscribes to#generalonce. - When GW-1 receives a message for
#general, it looks up its localChannel → [Socket]map and fans out locally in memory.
7.2 Presence (Heartbeats)
Presence is a “Heavy Write” problem. 10M users sending “I’m alive” every 5 seconds = 2M writes/sec.
- Optimization: Do not write to DB on every heartbeat.
- Client: Sends heartbeat to Gateway (WebSocket Ping).
- Gateway: Holds state in memory. Only updates Redis if status changes or TTL is about to expire (e.g., every 30s).
- Redis: Keys expire automatically (
SETEX user:1:status 40 "online"). If Gateway crashes, key expires, user appears offline.
8. Data Partitioning & Sharding
8.1 Sharding Messages (Cassandra)
We shard by channel_id.
- Pros: All messages for a channel are on the same node. Reading history is one disk seek.
- Cons: The Celebrity Problem. If
#generalhas 1B messages, the partition gets too big. - Fix: Bucket the partition by time.
Partition Key = (channel_id, month_year).
8.2 Service Discovery
How does User A know to connect to Gateway-52?
- Consistent Hashing:
hash(user_id) % N_Gateways. - Problem: If we add gateways, connections break.
- Service Discovery (ZooKeeper/Etcd): Gateways register themselves. The Load Balancer asks ZK for an available node and assigns it to the user.
9. Reliability, Caching, & Load Balancing
9.1 The “Unread Count” badge
Calculating unread counts (SELECT count(*) WHERE id > last_read_id) is expensive.
- Optimization: Store
unread_countin Redis. Increment it when a message arrives. Reset to 0 when user opens the channel.
9.2 Mobile Push Notifications
If the WebSocket is disconnected (App closed), the Gateway cannot push.
- Fallback: The Notification Service detects the missing WebSocket connection and sends a payload to APNS (iOS) or FCM (Android).
10. Interactive Decision Visualizer: Pub/Sub Propagation
Visualize how a single message fans out through Redis to multiple Gateways and Users.
Pub/Sub Propagation Simulator
Trace a message from Alice to Bob & Charlie
11. Interview Gauntlet
Q1: How do you handle “Typing…” indicators?
- Answer: Typing indicators are ephemeral. Do not store them in the DB. Use a lightweight Redis Pub/Sub channel. Use “Debouncing” on the client to send a signal only once every 2 seconds while typing, not on every keystroke.
Q2: What happens if a user is in 500 channels? Do they keep 500 WebSocket connections?
- Answer: No. One WebSocket connection per device. The Gateway multiplexes messages from all 500 channels down that single pipe.
Q3: How do you sync messages across devices (Phone + Laptop)?
- Answer: Each device has a unique
device_id. When a message is sent, the server pushes it to alldevice_ids associated with theuser_id(except the sender).
Q4: Why not use HTTP Long Polling?
- Answer: Long polling is inefficient for chat because of the header overhead and latency in re-establishing connections. WebSockets are preferred for bi-directional, low-latency comms.
Q5: How do you sort messages if two people send at the exact same millisecond?
- Answer: Use Snowflake IDs (Twitter’s ID generator) which are roughly time-ordered. If timestamps are identical, sort by
worker_idorsequence_idembedded in the Snowflake.
12. Summary: The Whiteboard Strategy
1. Requirements
- Func: Chat, History, Presence.
- Scale: 10M Concurrent, < 50ms Latency.
2. Architecture
* Gateway: Stateful WebSocket holder. * Redis: Pub/Sub for routing.
3. Data & API
4. Deep Dives
- Fanout: Gateway subscribes, not User.
- Presence: Heartbeats to Redis with TTL.