Buffering and Caching: The Art of Flow

[!NOTE] Why Buffer? A producer (CPU) generates data at 10 GB/s. A consumer (Network) accepts data at 1 GB/s. Without a buffer, the CPU must slow down to 1 GB/s. With a buffer, the CPU can dump data and move on.


1. Buffering Strategies

A. Single Buffer

OS reads a block from disk into a kernel buffer, then copies it to user space.

  • Problem: While copying, the disk is idle.

B. Double Buffering (Ping-Pong)

Two buffers. While the CPU consumes Buffer A, the Disk fills Buffer B.

  • Result: CPU and Disk work in parallel.

C. Circular Buffer (Ring Buffer)

Used for data streams (Keyboard, Network Cards).

  • Head Pointer: Where data is written.
  • Tail Pointer: Where data is read.
  • Full: Head catches up to Tail.

2. Interactive: The Ring Buffer

Visualize how producers and consumers chase each other in a fixed-size buffer.

0
Items
Buffer Empty.

3. The Page Cache

The OS reserves a huge chunk of RAM to cache disk blocks.

  • Read: Check cache first. If hit, return immediately (Nanoseconds).
  • Write: Write to cache (mark page as Dirty). Return success.
  • Writeback: A background thread (pdflush in Linux) writes dirty pages to disk every ~30 seconds.
  • fsync(): Forces a writeback immediately.

4. Zero Copy I/O

Standard File Serving (read + write):

  1. Disk → Kernel Buffer
  2. Kernel Buffer → User Buffer (CPU Copy)
  3. User Buffer → Socket Buffer (CPU Copy)
  4. Socket Buffer → NIC

Zero Copy (sendfile):

  1. Disk → Kernel Buffer
  2. Kernel Buffer → NIC (via pointers) Result: No CPU copies! 2x-3x throughput boost.

5. Code Example: Memory Mapped Files (Zero Copy in Java)

Java’s MappedByteBuffer uses the OS mmap syscall to map a file directly into memory. This bypasses the heap and allows the OS to manage caching.

```java import java.io.RandomAccessFile; import java.nio.MappedByteBuffer; import java.nio.channels.FileChannel; public class ZeroCopy { public static void main(String[] args) throws Exception { RandomAccessFile file = new RandomAccessFile("large_db.dat", "rw"); FileChannel channel = file.getChannel(); // Map 100MB of the file into memory // READ_WRITE mode implies changes are written back to disk eventually MappedByteBuffer map = channel.map(FileChannel.MapMode.READ_WRITE, 0, 100 * 1024 * 1024); // We can treat the file like a byte array! // No read()/write() system calls. // Write a value at byte 0 map.put(0, (byte) 123); // Read a value at byte 50 byte b = map.get(50); System.out.println("Read: " + b); // Force changes to disk (like fsync) map.force(); channel.close(); file.close(); } } ```