Disk Scheduling: The Elevator Problem
[!NOTE] The Mechanical Limit In a Hard Disk Drive (HDD), the read/write head must physically move to the correct track (Seek) and wait for the sector to spin under it (Rotation). Minimizing this physical movement is the #1 performance goal for HDDs.
1. Anatomy of Latency
When an application asks for data, the time taken is:
Latency = Seek Time + Rotational Latency + Transfer Time
- Seek Time: Moving the arm. (Slowest part: ~3-10ms).
- Rotational Latency: Waiting for the disk to spin. (Avg: 4ms at 7200 RPM).
- Transfer Time: Reading the bits. (Fast).
2. Scheduling Algorithms
The OS acts like an Elevator Dispatcher. If requests come for floors 10, 50, and 5, it shouldn’t go 10 → 50 → 5.
A. FCFS (First-Come, First-Served)
- Process requests in order.
- Problem: “Wild Swings” (1 → 100 → 2 → 99). Terrible seek time.
B. SSTF (Shortest Seek Time First)
- Go to the closest request next.
- Problem: Starvation. If requests keep hitting tracks 10-20, a request for track 100 will never be served.
C. SCAN (The Elevator)
- Move continuously from 0 → 100, servicing requests. Then reverse 100 → 0.
- Problem: Middle tracks get serviced twice as often (going up and down).
D. C-SCAN (Circular SCAN)
- Move 0 → 100. Then immediately jump back to 0 without servicing.
- Pros: Uniform wait time.
3. Interactive: The Elevator Visualizer
Be the scheduler. Add requests and see how the head moves.
4. The Modern Era: SSDs and NVMe
Wait… SSDs don’t have moving heads. Why do we still have schedulers?
Actually, for SSDs, we often disable complex scheduling (using noop or none scheduler).
Why?
- Seek is Free: Reading address 0 or 1000000 takes the same time.
- Internal Parallelism: SSDs have multiple NAND chips. We want to send requests in parallel, not serialize them into a SCAN line.
Multi-Queue Block Layer (blk-mq)
Modern Linux uses blk-mq. Instead of one global request queue (which requires a global Lock), we have one queue per CPU core.
- Requests are pushed to the software queue.
- Batched and sent to the NVMe hardware queues.
io_uring: Asynchronous I/O Done Right
Standard read() blocks the thread. aio was broken.
io_uring (introduced in Linux 5.1) uses two ring buffers in shared memory:
- Submission Queue (SQ): App pushes requests.
- Completion Queue (CQ): Kernel pushes results. This allows Zero Syscall Overhead (in polling mode).
5. Code Example: Request Merging Simulation (Go)
The OS tries to merge adjacent requests to reduce overhead. Read sectors 1-2 and 3-4? Merge into Read 1-4.