Design LeetCode (Remote Code Execution)

[!NOTE] This module explores the core principles of Design LeetCode (Remote Code Execution), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. What is a Code Execution Service?

A Remote Code Execution (RCE) service allows users to submit code in various languages (Python, C++, Java), executes it against a set of test cases, and returns the results (Pass/Fail, Runtime, Memory Usage).

[!WARNING] The Danger Zone: This is one of the most dangerous systems to build. You are literally inviting strangers to run arbitrary code on your servers. Without proper Sandboxing, a user could rm -rf /, mine crypto, or scan your internal network.

Real-World Examples

  • LeetCode / HackerRank: Competitive programming platforms.
  • Judge0: Open-source RCE API.
  • Replit / CodeSandbox: Online IDEs (stateful environments).

2. Requirements & Goals

Functional Requirements

  1. Code Submission: User submits source code and language.
  2. Execution: System runs code against test cases.
  3. Feedback: Returns Standard Output (STDOUT), Standard Error (STDERR), and Verdict (Accepted, Wrong Answer, TLE).
  4. Limits: Enforce strict Time Limits (e.g., 2s) and Memory Limits (e.g., 256MB).

Non-Functional Requirements

  1. Security (Critical): Zero Trust. Code must run in total isolation.
  2. Performance: Low overhead for container startup. Users expect results in seconds.
  3. Concurrency: Handle thousands of simultaneous submissions.

3. Capacity Estimation

  • Daily Submissions: 1 Million.
  • Peak Traffic: 100 submissions/sec (during contests).
  • Execution Time: Average 2 seconds per task.
  • Compute Resources: Since tasks are CPU-bound, we need scalable worker nodes.
  • 100 tasks/sec × 2 sec/task = 200 concurrent containers running.
  • If each core handles 1 container, we need ~200 vCPUs (e.g., 25-50 large instances).

4. System APIs

Submit Code

POST /v1/submissions
{
  "code": "print('hello')",
  "language": "python3",
  "problem_id": "123"
}
Response: { "submission_id": "sub_abc123" }

Get Status (Polling)

GET /v1/submissions/sub_abc123
Response:
{
  "status": "PROCESSING" // or COMPLETED
  "result": { "verdict": "Accepted", "runtime_ms": 45 }
}

5. Database Design

We need to store submission history and problem data. For more on relational schema design, see Module 04: Database Basics.

1. Submissions Table (Postgres)

Column Type Description
id UUID Primary Key
user_id UUID The submitter
problem_id UUID The problem solved
code TEXT The source code (Stored in S3 if large)
status ENUM PENDING, PROCESSING, AC, WA, TLE, RE
runtime INT Execution time in ms
memory INT Memory usage in KB

2. Test Cases (Object Store / S3)

  • Test cases are often large files.
  • Structure: s3://problems/{problem_id}/input/1.txt, s3://problems/{problem_id}/output/1.txt.
  • Worker nodes download these during execution.

6. High-Level Design

High-Level Architecture: The Secure Execution Pipeline.

System Architecture: LeetCode Execution Engine
Async Job Queue | Multilingual Workers | gVisor & Firecracker Isolation
Submission Path
Result Feedback
Security Sandbox
API Layer
Auth Svc
Submission Svc
Redis
JOB_QUEUE
LPUSH / RPOP
Worker Node (vCPU Heavy)
SECURITY BOUNDARY
Sandbox: gVisor / Firecracker
RootFS (Read-Only)
Cgroups vCPU/RAM
Net Namespace (None)
$ ./compiled_binary < input.txt
Syscalls Intercepted by gVisor Sentry
S3 (TestCases)
Mounted Read-Only
PostgreSQL
Submission Results
POST /submit RPOP job Update Status

The system follows an Asynchronous Worker Pattern to handle long-running code execution without blocking the API:

  1. API Gateway / Submissions Svc: Validates the request, saves initial metadata to PostgreSQL, and pushes the job into a Redis Job Queue (See Module 08: Messaging for more).
  2. Job Queue: Decouples the API from execution. This buffers traffic bursts and allows for easy scaling of worker nodes.
  3. Worker Nodes: Independent compute nodes that poll the queue via RPOP.
  4. Sandbox Runtime (The Isolation Zone):
    • Ephemeral Sandbox: The worker creates a secure environment (e.g., gVisor or Firecracker).
    • Test Case Mounting: Mounts test cases from S3 as Read-Only.
    • Execution: Runs the untrusted code while intercepting syscalls.
  5. Result Store: Once execution finishes, the worker updates the submission status in PostgreSQL (e.g., AC, WA, TLE).

[!TIP] Why Async? Code execution takes time (1-10s). Keeping an HTTP connection open is brittle. Polling or WebSockets is better for the client.


7. Deep Dive: Isolation Strategy (The Sandbox)

How do we prevent system("rm -rf /") or kernel exploits? This is handled by the Security Boundary shown in our architecture.

Level 1: Standard Containers (Docker)

  • Tool: Docker, LXC.
  • Pros: Fast boot (milliseconds).
  • Cons: Shared Kernel. All containers share the host OS kernel. If a hacker finds a kernel vulnerability (e.g., Dirty COW), they can escape the container and take over the host.
  • Verdict: Not secure enough for untrusted public code.

Level 2: Virtual Machines (VMs)

  • Tool: AWS EC2, VMWare.
  • Pros: Hardware-level virtualization (Hypervisor). Very secure.
  • Cons: Slow boot time (minutes). Too heavy for a 2-second script.

Level 3: MicroVMs / User-Space Kernels (The Gold Standard)

This bridges the gap between VM security and Container speed.

  1. gVisor (Google):
    • Intercepts syscalls in User Space.
    • The “Guest” application talks to gVisor (Sentry), not the Host Kernel.
    • Acts as a “security proxy” for syscalls.
  2. Firecracker (AWS Lambda):
    • Lightweight KVM-based microVMs.
    • Boots in < 125ms.
    • Used by AWS Lambda and Fargate.

8. Defense in Depth (Specific Mitigations)

Even with MicroVMs, apply these Linux primitives:

A. Preventing “Fork Bombs” (cgroups)

A Fork Bomb (while(1) fork()) crashes a server by exhausting the Process ID (PID) table.

  • Solution: Control Groups (cgroups).
  • Configure pids.max = 64. If the code tries to spawn the 65th process, the kernel blocks it.
  • Also limit CPU shares and Memory (OOM Killer).

B. Preventing Network Scans (Namespaces)

Users shouldn’t scan your internal AWS VPC.

  • Solution: Network Namespaces.
  • Run the container with No Network access (--network none).
  • Only map STDIN/STDOUT.

C. Preventing File System Damage (Seccomp)

Users shouldn’t read /etc/passwd.

  • Solution:
    1. Mount Root FS as Read-Only.
    2. Use Seccomp (Secure Computing Mode) to whitelist only necessary syscalls (read, write, exit). Block socket, execve (except strictly controlled paths).

9. Data Partitioning & Sharding

We generate millions of submissions. A single DB won’t hold up.

Sharding Strategy: Shard by submission_id

  • Shard by user_id: Good for “Show me all my submissions”. Bad for global analytics or if one user spams.
  • Shard by submission_id: Even distribution. But “Show me my submissions” requires Scatter-Gather.
  • Decision: LeetCode is Write-Heavy during contests. We likely prioritize Write Throughput, so Sharding by submission_id (or using a dedicated high-write store like Cassandra/DynamoDB) is preferred. For user history, we can maintain a secondary index.

10. Interactive Decision Visualizer: The Secure Pipeline

This demo visualizes how different layers of defense block different types of attacks. Select an Attack Vector and see which layer catches it.

[!TIP] Try it yourself: Click “Fork Bomb” or “File Deletion” to see how Linux cgroups and Seccomp filters block these attacks in real-time.

Attack Simulator
print("Hello World")
1
Job Queue
Pending
2
cgroups (pids.max)
Checks CPU/RAM/PIDs
3
Network Namespace
Checks Connectivity
4
Seccomp Filter
Checks Syscalls (rm, exec)
5
Verdict
-

11. System Walkthrough: The Life of a Submission

Let’s trace a user submitting Python code that tries to access the network.

Step 1: Submission

  • User sends code via POST /submissions.
      {
    "code": "import socket; s = socket.socket(); s.connect(('google.com', 80))",
    "lang": "python3"
      }
    
  • API Gateway generates submission_id: "abc-123" and pushes to Redis:
      RPUSH submission_queue "{\"id\":\"abc-123\", \"lang\":\"python3\", ...}"
    

Step 2: Worker Processing

  • Worker Node (Golang) pulls the job: BLPOP submission_queue 0.
  • It launches a Firecracker MicroVM with restricted arguments:
      # Conceptual command
      firecracker-run \
    --kernel vmlinux \
    --rootfs python3-rootfs.ext4 \
    --network none \  # <--- Network Isolation
    --cpu-template T2 \
    --memory 128M
    

Step 3: Execution & Interception

  • The Python code runs inside the MicroVM.
  • It tries to call the connect() syscall.
  • The Kernel (inside MicroVM) checks the network namespace. It sees no network interfaces (only loopback).
  • The syscall fails with ENETUNREACH (Network is unreachable).

Step 4: Result Collection

  • The worker captures STDERR: OSError: [Errno 101] Network is unreachable.
  • It writes the result to Postgres:
      UPDATE submissions SET status='RUNTIME_ERROR', stderr='Network unreachable...' WHERE id='abc-123';
    

12. Requirements Traceability Matrix

Requirement Architectural Solution
Code Isolation gVisor / Firecracker (MicroVMs) prevent kernel sharing.
Resource Limits cgroups enforce CPU, Memory, and PID limits.
Network Security Network Namespaces (--network none) block internet access.
File System Security Read-Only RootFS + Seccomp whitelist prevents rm -rf.
Scalability Redis Job Queue decouples API from Workers. Auto-scaling workers.
Concurrency Firecracker boots in <125ms, allowing high density (thousands per node).

13. Follow-Up Questions: The Interview Gauntlet

I. Security & Isolation

  • Why is Docker not enough? Docker shares the Host Kernel. A kernel exploit (e.g., Dirty Pipe) allows root access to the host.
  • Explain Seccomp. It stands for Secure Computing. It’s a BPF filter that whitelists syscalls. If a process calls socket() and it’s not whitelisted, the kernel kills the process.
  • How to prevent Infinite Loops? Use setrlimit(RLIMIT_CPU) in the runner code + a hard timeout (SIGKILL) from the worker supervisor after 2 seconds.
  • How to prevent memory exhaustion? cgroups memory limit. The OOM Killer will kill the specific container, not the worker node.

II. Scalability

  • What if the Queue backs up? Auto-scale the Worker Group based on LLEN(submission_queue). If queue > 1000, add 10 nodes.
  • Handling Large Outputs: If a user prints 1GB of text, it fills the disk. Limit STDOUT capture to 100KB. Truncate the rest.
  • Shard Strategy: Shard DB by submission_id. No need for complex cross-shard joins.

III. Operational Excellence

  • How to update the runtime (e.g., Python 3.9 → 3.10)? Build a new RootFS image. Rolling update the workers to use the new image.
  • Malicious Users: Rate limit by User ID. If a user triggers Security Violations repeatedly, ban the account.

14. Summary: The Whiteboard Strategy

If asked to design LeetCode, draw this 4-Quadrant Layout:

1. Requirements

  • Func: Execute Code, Feedback, Limits.
  • Non-Func: Security (Sandbox), Speed (<2s).
  • Scale: 100 QPS (Burst).

2. Architecture

[Client] → [API] → [Redis Queue] ↓ [Worker Group] [Firecracker VM (Seccomp/NS)] ↓ [DB]

* Async Worker: Decouples execution. * Firecracker: MicroVM Isolation.

3. Data & API

POST /submit {code, lang} DB: Submissions(id, user, status, result) S3: Test Cases (Read-Only)

4. Security Layers

  • Network: Namespace (`--net none`).
  • FS: Read-Only RootFS.
  • Syscalls: Seccomp Whitelist.
  • Kernel: gVisor / MicroVM.

Return to Specialized Systems