Design TikTok’s Short Video Feed
[!NOTE] This module explores the core principles of Design TikTok’s Short Video Feed, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Problem Statement
Designing TikTok’s feed is a massive undertaking. Unlike Instagram or Twitter, which often rely on a “Follower Graph,” TikTok’s “For You” page is almost entirely driven by a Personalization Loop. It must serve high-quality video content with zero-buffer latency to billions of users.
The Key Difference: In a follower-based feed, you show what friends posted. In TikTok, you show what the Algorithm predicts you will like next, based on every swipe, like, and second watched.
2. Requirements & Goals
Functional Requirements
- Personalized Feed: Serve a continuous stream of short videos (< 60s) tailored to user interests.
- Video Upload: Users can upload videos; the system must transcode them for all devices.
- Interactions: Track likes, shares, and watch time (most important) for the recommendation engine.
- Content Moderation: Automatically filter illegal or harmful content before it goes viral.
Non-Functional Requirements
- Low Latency: Video playback must start instantly (< 200ms).
- Scalability: Support 1 Billion+ DAU.
- High Availability: The feed is the core product; it cannot go down.
- Data Consistency: Recommendations should reflect user behavior within minutes.
3. High-Level Architecture
The system is split into three main pipelines: Ingestion, Personalization, and Serving.
4. Deep Dive: Content Ingestion & Serving
When a video is uploaded, it’s not simply stored. It must be prepared for Adaptive Bitrate Streaming.
The Flow:
- Upload: Video is uploaded to Object Storage (e.g., S3).
- DAG Workers: A series of workers (Directed Acyclic Graph) handle:
- Transcoding: Convert the 4K upload into 1080p, 720p, and 480p versions using codecs like H.264 or H.265.
- Chunking: Break the video into 2-5 second snippets. This allows the player to start playing the first snippet while downloading the rest.
- Thumbnail Extraction: For the feed preview.
- Global Edge: The video chunks are cached on CDNs (Content Delivery Networks) near the user to minimize “Time to First Byte” (TTFB).
[!TIP] Analogy: The Fast Food Kitchen Instead of cooking one giant burger (full video) when ordered, the kitchen pre-slices the tomatoes and patties (chunking). When a user swipes, you have the “first bite” ready immediately.
5. The Personalization Loop (The Secret Sauce)
TikTok’s feed doesn’t just show “new” videos. It shows predicted affinity.
1. Feature Engineering
We track:
- Positive Signals: Like, Share, Re-watch, Watch-to-end.
- Negative Signals: Quick swipe, “Not Interested” feedback.
2. Candidate Generation (Filtering)
Out of 100M videos, we pick ~1,000 using simple filters (language, location, popular trends).
3. Ranking (Deep Learning)
A neural network ranks these 1,000 videos by the probability that this specific user will watch this specific video for more than 10 seconds.
[!IMPORTANT] Real-Time Updates: To stay fresh, these signals are processed via Kafka and fed back into the ML models within minutes, so if you watch 3 cat videos, the 10th video in your feed is already a cat.
6. Optimization for User Experience
- Prefetching: Feed Service sends the next 5 video URLs to the mobile app. The app downloads the first 2 seconds of the next video while you are still watching the current one.
- Edge Compute: Some ranking logic can happen at the CDN level (Edge) to reduce Round-Trip Time (RTT).
7. Interview Gauntlet
- How do you handle “Hot” videos (Viral content)?
- Ans: Viral videos are cached on the CDN edge nodes and in local Redis clusters within data centers to avoid hitting the main database or object store repeatedly.
- How do you prevent “Echo Chambers”?
- Ans: The algorithm includes a Diversity Factor. Every X videos, we inject a “High Potential” trending video outside your usual interests to discover new preferences.
- What if the user is offline?
- Ans: The app keeps a local cache of ~10 videos downloaded during the last session.
8. Summary
- Serving: Use Adaptive Bitrate (HLS/DASH) and CDN Prefetching for zero-latency.
- Ingestion: Distributed Transcoding via DAG workers.
- Algorithm: Multi-stage (Filtering → ML Ranking) with real-time feedback loops.
- Scalability: Shard user metadata and video metadata; use global CDNs for the media.