Interrupts and DMA: Asynchronous Magic

[!NOTE] The Problem of Speed The CPU runs at 3 GHz (3 billion cycles/sec). A hard disk seek takes 5 ms (15 million cycles). If the CPU waits for the disk, it wastes 15 million cycles doing nothing.

This chapter explains how the OS avoids waiting.

Real-World Analogy: Ordering Pizza

Imagine you are writing a book (the CPU doing work) and you order a pizza (requesting data from a disk).

  • Polling: You stop writing and stare out the window, waiting for the delivery driver to arrive. You get no work done.
  • Interrupts: You keep writing. When the driver rings the doorbell (IRQ), you pause your writing, go pay for the pizza (ISR), put it on the table, and resume writing.
  • DMA: You hire an assistant. You tell the assistant, “When the pizza arrives, pay for it and put it in the fridge.” You keep writing the entire time. The assistant handles the delivery and only taps you on the shoulder when everything is completely put away.

1. The Evolution of Waiting

A. Polling (Busy Waiting)

The CPU constantly checks the device status register. while (device.status ≠ READY);

  • Analogy: “Are we there yet? Are we there yet?”
  • Pros: Zero latency reaction (good for ultra-fast devices).
  • Cons: Wastes 100% CPU.

B. Interrupts (Event Driven)

The CPU sends a command and goes to sleep (or runs another process). When the device is done, it sends an electrical signal (IRQ).

  • Analogy: Setting a timer on the oven and watching TV.
  • Mechanism:
    1. Device pulls IRQ line high.
    2. CPU pauses execution.
    3. CPU looks up the Interrupt Vector Table (IVT) or IDT.
    4. CPU jumps to the Interrupt Service Routine (ISR).

C. DMA (Direct Memory Access)

For large transfers, even interrupts are too slow (one interrupt per byte?). DMA allows the device to write directly to RAM.

  • Analogy: Hiring a moving company (DMA) to move your furniture while you work, instead of carrying every box yourself.

2. Interactive: CPU Timeline Visualizer

See how much “Green” (Useful Work) the CPU gets done in each mode.

User App Waiting/Stalled ISR/Overhead
Select a mode to run simulation...

3. Advanced Concepts

The Cost of Interrupts

Interrupts are not free. They cause a Context Switch:

  1. Flush CPU Pipeline.
  2. Save registers to stack.
  3. Pollute L1/L2 Caches.

Top Half vs Bottom Half: To minimize this cost, Linux splits ISRs:

  • Top Half: Acknowledge hardware, copy small data, schedule work. (Fast, Interrupts Disabled).
  • Bottom Half: Process data, traverse protocols. (Slower, Interrupts Enabled).

Scatter-Gather DMA

Instead of copying one contiguous block, the DMA controller can read a “Scatter List” of pointers and fill non-contiguous memory pages in one go. This is crucial for Zero Copy networking.

Advanced DMA Hardware Realities

  1. Bus Mastering: The CPU is usually the “master” of the system bus. For DMA to work, the DMA controller must become a Bus Master, taking control of the memory bus to read/write without CPU intervention.
  2. Cycle Stealing: The DMA controller and the CPU share the same system bus. When the DMA controller transfers a word of data, it might temporarily block the CPU from accessing memory. This is called “Cycle Stealing,” but the performance impact is negligible compared to the CPU handling the entire transfer.
  3. Interrupt Storms: Under extreme load (e.g., a 10Gbps NIC receiving 14 million packets per second), the sheer volume of interrupts can livelock the CPU (spending 100% of time in the ISR). Modern NICs use NAPI (New API) in Linux, which switches from Interrupts to Polling under high load to mitigate this.

4. Code Example: Java NIO (DMA Abstraction)

In high-level languages, we don’t program the DMA controller directly. However, APIs like Java NIO ByteBuffer allow us to use Direct Memory, which the OS can use for DMA without copying to the JVM Heap.

import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class DMAExample {
    public static void main(String[] args) throws Exception {
        RandomAccessFile file = new RandomAccessFile("data.bin", "rw");
        FileChannel channel = file.getChannel();

        // allocateDirect asks the OS for off-heap memory.
        // This memory allows "Zero Copy" DMA transfers.
        // If we used allocate(), the JVM would have to copy
        // data from the heap to a temp buffer before I/O.
        ByteBuffer buffer = ByteBuffer.allocateDirect(1024 * 1024); // 1MB

        System.out.println("Starting DMA Read...");

        // This read operation maps to a scatter-gather DMA op
        // at the kernel level (if supported).
        // The CPU is free to run other threads while this happens.
        int bytesRead = channel.read(buffer);

        System.out.println("DMA Complete. Bytes: " + bytesRead);

        buffer.flip();
        channel.close();
        file.close();
    }
}
package main

import (
	"fmt"
	"os"
	"syscall"
)

func main() {
	file, err := os.OpenFile("data.bin", os.O_RDWR, 0666)
	if err != nil {
		panic(err)
	}
	defer file.Close()

	// In Go, mmap is often used to map a file directly into memory,
	// allowing the OS to page data in/out via DMA transparently.
	data, err := syscall.Mmap(int(file.Fd()), 0, 1024*1024, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
	if err != nil {
		panic(err)
	}
	defer syscall.Munmap(data)

	fmt.Println("File mapped to memory. DMA will handle paging transparently.")

	// Read a byte to trigger a page fault & DMA read if not in memory
	fmt.Printf("First byte: %x\n", data[0])
}