Design Dropbox / Google Drive
[!NOTE] This module explores the core principles of Design Dropbox / Google Drive, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. What is Dropbox?
Dropbox is a cloud-based file hosting service that offers file synchronization, personal cloud, and client software. It allows users to create a special folder on their computers, which Dropbox synchronizes so that it appears to be the same folder (with the same contents) regardless of which device is used to view it.
Why is this hard?
- Scale: Billions of files. Exabytes of data.
- Consistency: If I edit a file on my laptop, my phone must see the change immediately.
- Bandwidth: Uploading a 1GB file every time I change one character is wasteful.
- Reliability: Data loss is unacceptable (11 9s durability).
[!TIP] In an interview, clarify if you are designing for Collaborative Editing (Google Docs - OT/CRDT) or File Storage (Dropbox - Whole file locking/versions). We are focusing on File Storage.
Try it yourself: Open Dropbox or Google Drive on two devices. Upload a file on one. Watch how fast it appears on the other. Now disconnect the internet on one device, edit the file, and reconnect. This demonstrates conflict resolution in action.
2. Requirements & Goals
Functional Requirements
- Add/Delete/Update File: Users can perform standard file operations.
- Sync: Changes propagate to all devices automatically.
- Versioning: Users can restore previous versions of a file.
- Sharing: Users can share files with permissions.
Non-Functional Requirements
- Reliability: Data must never be lost. We need Strong Consistency for metadata.
- Availability: The service should be always up, but Consistency > Availability (ACID for metadata).
- Bandwidth Efficiency: Minimize network usage for large files using Block-Level Deduplication.
3. Capacity Estimation
Let’s assume 500 Million Users and 100 Million DAU.
Storage
- Avg user has 10 GB of files.
- Total Storage = 500M × 10 GB = 5 Exabytes (EB).
- We need cold storage strategies (e.g., AWS Glacier) for old versions.
Throughput
- Assume 1 file change per active user per day.
- QPS = 100M / 86400 ≈ 1,200 QPS (Write).
- Read QPS might be equal or slightly higher (syncing to multiple devices).
- Peak QPS: ~2,500 QPS.
4. System APIs
We need a RESTful API for the client. The client is “thick” (smart), handling chunking logic.
4.1 Upload Flow (Presigned URL)
The client first calculates hashes of all chunks (e.g., 4MB each) and asks the server which ones are new.
Step 1: Initiate Upload & Deduplication Check
POST /files/upload_init
Authorization: Bearer <token>
{
"filename": "video.mp4",
"path": "/photos/vacation",
"size_bytes": 10485760,
"chunk_hashes": ["hashA", "hashB", "hashC"]
}
Response:
{
"upload_id": "up_12345",
"missing_chunks": ["hashC"], // Server already has A and B!
"presigned_urls": {
"hashC": "https://s3.aws.com/bucket/hashC?signature=..."
}
}
Step 2: Upload Raw Chunk (Direct to S3)
PUT https://s3.aws.com/bucket/hashC?signature=...
Body: <binary_data_of_chunk_C>
Step 3: Commit File Metadata
POST /files/commit
Authorization: Bearer <token>
{
"upload_id": "up_12345",
"chunk_list": ["hashA", "hashB", "hashC"]
}
4.2 Download Flow
GET /files/download?file_id=file_987
Authorization: Bearer <token>
// Response: 302 Redirect to Presigned S3 URL
4.3 Sync Flow (Long Polling)
The client waits for changes.
POST /sync/poll
Authorization: Bearer <token>
{
"cursor_id": 5021 // Last seen change ID
}
5. Database Design
We separate Metadata from Block Data.
Metadata DB (SQL/NewSQL)
Stores file hierarchy, permissions, and mapping to blocks (visualized as Metadata DB). We need ACID transactions.
- Table
Files:file_id(PK),user_id,latest_version,is_directory,created_at. - Table
Versions:version_id(PK),file_id,block_list(Array of hashes),size. - Table
Blocks:hash(PK),s3_url,ref_count(For Garbage Collection).
Block Storage (Object Store)
Stores the actual raw data chunks (visualized as Block Storage / S3).
- S3 / Azure Blob: Immutable. Key is the SHA-256 hash of the content.
- Why Separation?: Metadata requires fast updates and complex queries (SQL). Block data is massive and write-once-read-many (Blob Store).
6. High-Level Design
File Sync Architecture. Separating Metadata from Block Data.
6.1 System Walkthrough (Happy Path)
Let’s trace the lifecycle of a file upload:
- Chunking: The Client splits the file (e.g.,
video.mp4) into 4MB chunks and calculates the SHA-256 hash for each chunk. - Deduplication Check: Client sends the list of hashes to the Metadata Service.
- Server Response: The server checks the Metadata DB and responds: “I already have chunk A and B. Only upload chunk C.”
- Upload: The Client uploads the raw bytes of chunk C directly to the Block Service (S3).
- Commit: Once the upload is confirmed, the Client sends a
commitrequest to the Metadata Service, linking the filevideo.mp4to the list of hashes[A, B, C]. - Notification: The Metadata Service commits the transaction and pushes a “New Version” event to the Notification Service.
- Sync: Other active devices (listening via Long Polling/WebSocket) receive the event and begin downloading the new metadata and missing chunks.
7. Component Design (Deep Dive)
Chunking & Block-Level Deduplication
Instead of uploading the whole file, we split it into 4MB Chunks.
- Calculate SHA-256 of each chunk.
- Check Existence: Client sends list of hashes to server.
- Deduplication: Server responds: “I already have Hash A and B (from other users!). Only upload Hash C.”
- Benefit: Saves bandwidth and storage. If 100 people upload the same movie, we store it once.
Delta Sync & Rolling Hash
What if I insert one character at the beginning of a file?
- Fixed-Size Chunking (The Problem): Every 4MB chunk shifts by 1 byte. Even though the content is 99.9% identical, ALL hashes change because the boundaries have shifted. This breaks deduplication and forces a full re-upload.
- Rolling Hash (Rabin-Karp): Instead of splitting by byte offset (0, 4MB, 8MB), we use a Sliding Window algorithm to calculate a checksum for every byte window (e.g., 32 bytes).
- The Rule: If
Checksum % M == 0, create a new chunk boundary. - The Result: Boundaries are determined by the content itself. If you insert a word at the start, only the first chunk changes. The subsequent chunks retain their original boundaries and hashes, allowing 99% deduplication.
Conflict Resolution Strategy
When two users edit the same file offline and come online:
- Last Write Wins (LWW): Simple but dangerous. The server keeps the version with the latest timestamp. User A’s work is overwritten/lost.
- Conflicted Copy (Dropbox Approach): The server detects that the
base_versionof User B’s upload is older than thecurrent_versionin the DB.- Instead of overwriting, the server accepts User B’s file but renames it:
Resume (Conflicted Copy 2023-10-27).docx. - Both versions are kept. The user must manually merge them. This preserves data integrity (Zero Data Loss).
- Instead of overwriting, the server accepts User B’s file but renames it:
Metadata Sharding Strategy
The Metadata DB grows to billions of rows. We must shard.
- Shard by file_id: Good distribution, but listing a folder requires querying ALL shards (Scatter-Gather). Slow for “List Folder”.
- Shard by user_id: All files for
User Aare on Shard 1. - Pros: Fast “List Folder”. ACID transactions within a user account are easy.
- Cons: Hot Shard problem (if one user has 10M files). Sharing files across users (Cross-Shard) is complex.
- Solution: We typically shard by
user_idbut use a separate “Sharing Service” to handle cross-user pointers.
Namespace Management (Flattening)
How do we store the file structure /User/Documents/Project/File.txt?
- Naive Approach: Store absolute paths as strings.
- Problem: Renaming the folder
DocumentstoDocsrequires updating strings for every single file inside (millions). This is O(N) write amplification. - Optimized Approach (Flattening): Store files by
file_idandparent_id(Adjacency List). - Renaming: Just update the
nameof the folder row. The children point to the folder’sid, which never changes. O(1) operation.
CREATE TABLE file_metadata (
id BIGINT PRIMARY KEY,
parent_id BIGINT, -- Points to the parent folder's ID
name VARCHAR(255),
is_folder BOOLEAN,
-- Renaming 'Documents' only updates THIS single row.
-- All children still point to this ID.
INDEX(parent_id)
);
Metadata ACID Properties vs Block Storage
We have a hybrid consistency model:
- Metadata (ACID): The file hierarchy must be strongly consistent. If I move a file, it must disappear from the old folder immediately. We use MySQL or Spanner.
- Block Storage (Eventual): The actual bytes in S3 are immutable. We rely on the Metadata DB as the source of truth. If the metadata says “File A is at Hash X”, the client fetches Hash X.
- We cannot use eventually consistent NoSQL (like Cassandra) for the file hierarchy, or a user might upload a file and not see it immediately.
8. Requirements Traceability
| Requirement | Design Decision | Justification |
|---|---|---|
| Consistency | ACID (SQL) for Metadata | File hierarchy must be strongly consistent to prevent sync conflicts. |
| Bandwidth | Block-Level Deduplication | Saves 40-60% of bandwidth by not re-uploading duplicate content. |
| Reliability | S3 (11 9s) | Object stores are designed for massive durability. |
| Sync Speed | Long Polling / WebSocket | Push model is faster than Poll, ensuring near real-time updates. |
| Renaming | Namespace Flattening | Allows O(1) folder renaming by updating parent reference only. |
9. Observability & Metrics
We use the RED Method (Rate, Errors, Duration) to monitor health.
Key Metrics
- Upload Success Rate: (Successful Uploads / Total Attempts). Alert if < 99.9%.
- Sync Latency: Time from “Client A Commit” to “Client B Notification”.
- Deduplication Ratio: (Bytes Saved / Total Bytes). Measures algorithm efficiency.
- S3 Throughput: Monitor ingress/egress to predict costs.
Distributed Tracing
Each file operation generates a TraceID.
- Span 1: Client Upload (Chunk 1)
- Span 2: Metadata Service (Commit)
- Span 3: Notification Service (Fanout)
- We can visualize this in Jaeger/Zipkin to find bottlenecks.
10. Deployment Strategy
Ring Deployment
We cannot risk breaking sync for 500M users.
- Ring 0 (Internal): Dropbox Employees.
- Ring 1 (Canary): 1% of users (Early Adopters).
- Ring 2 (Global): 100% Rollout.
Client Updates
The desktop client is harder to update than the server.
- Auto-Update: Background process checks for new versions.
- Backward Compatibility: The server must support older client versions (API Versioning).
11. Interview Gauntlet
Rapid Fire Questions
- Why not use a NoSQL DB for metadata? NoSQL is often eventually consistent. We need ACID to ensure file moves/renames are atomic and visible instantly to prevent sync conflicts.
- How do you handle two users editing the same file offline? Conflict Resolution. Usually “Last Write Wins” or creating a “Conflicted Copy” (both files kept) for the user to merge manually.
- What happens if the client crashes during upload? The client should retry. Since we use unique Hash IDs for chunks, re-uploading the same chunk is idempotent.
- Why use Long Polling instead of WebSockets? WebSockets require maintaining stateful connections, which is expensive for millions of idle clients. Long Polling is stateless and easier to scale for notifications.
- How does ‘Delta Sync’ differ from ‘Deduplication’? Deduplication works on the whole block level across users. Delta Sync works on the byte level for a single file using Rolling Hashes.
12. Interactive Decision Visualizer: Rolling Hash & Deduplication
This simulator demonstrates the difference between Fixed-Size Chunking and Rolling Hash (Variable-Size).
- Sync: Upload the initial text.
- Edit: Insert a character at the start (e.g., “A”).
- Compare:
- Fixed: Every chunk boundary shifts. Hashes change. Zero Deduplication.
- Rolling: Only the first chunk changes. The rest match. High Deduplication.
[!TIP] Try it yourself: Click “Sync Initial”. Then type “X “ at the very start of the text box. Click “Sync Edit”. Watch the Rolling Hash catch up!
Fixed-Size Chunking (Every 5 chars)
Rolling Hash (Space Boundary)
13. Summary (Whiteboard)
Scale & Capacity
- 5 Exabytes Storage (Cold Storage Critical)
- Write-Heavy (Sync frequent small changes)
- Metadata: ACID is Mandatory
- Block Data: Immutable, Content-Addressed
Core Algorithms
- Chunking: 4MB blocks (Deduplication)
- Rabin-Karp: Rolling Hash (Delta Sync)
- Flattening: O(1) Folder Rename
- Merkle Tree: Fast Sync Verification
Trade-offs
- Consistency > Availability for Metadata
- Sync Speed vs Battery (Long Poll vs WS)
- Storage Cost vs Latency (Tiered Storage)
Bottlenecks
- Metadata DB (Sharding is hard)
- Client Bandwidth (Upload speed)
- Thundering Herd on Sync Notification