Thrashing & Working Sets

[!NOTE] This module explores the core principles of Thrashing & Working Sets, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Spiral of Death: Thrashing

Imagine you have 16GB of RAM. You open a game (8GB), a browser (4GB), and a compiler (6GB). Total demand: 18GB.

The OS tries to satisfy everyone by Swapping pages to disk.

  1. CPU switches to Game. Game needs Page A.
  2. Page Fault. OS evicts Browser Page B. Loads Game Page A.
  3. CPU switches to Browser. Browser needs Page B.
  4. Page Fault. OS evicts Game Page A. Loads Browser Page B.

The CPU spends 99% of its time waiting for the Disk (I/O) and 1% executing instructions.

  • CPU Utilization: Approaching 0%.
  • Disk Utilization: Approaching 100%. This state is called Thrashing.

2. The Solution: Working Set Model

We cannot just look at “Total Memory Requested” (18GB). Processes don’t use all their memory at once.

  • Initialization Phase: Uses setup code.
  • Loop Phase: Uses specific data arrays.

The Working Set (W) of a process is the set of pages it is actively using at time t. If the sum of all Working Sets is less than Physical RAM, the system is stable. &Sigma; W(Process_i) < RAM_Size

Locality of Reference

This model works because of Locality:

  1. Temporal Locality: If you access address X, you will likely access X again soon (e.g., loop variables).
  2. Spatial Locality: If you access address X, you will likely access X+1 soon (e.g., array traversal).

3. Interactive: Thrashing Monitor

Simulate a system under load. Add processes and watch the metrics.

  • Green: Working Set fits in RAM.
  • Yellow: Swapping starts (Graceful degradation).
  • Red: Thrashing (System collapse).
100%
CPU Efficiency
Disk I/O: 0%
Physical Limit
Memory Demand
System Healthy

4. Code Example: Spatial Locality Benchmark

We can prove the existence of hardware caching and the importance of locality by traversing a 2D array in two different ways.

  • Row-Major (Fast): Access memory sequentially (a[0], a[1], a[2]).
  • Column-Major (Slow): Jump huge strides (a[0], a[1000], a[2000]). This causes Cache Misses and potentially Page Faults.
package main

import (
	"fmt"
	"time"
)

const N = 5000

func main() {
	// Allocate 5000x5000 int array (100MB)
	matrix := make([][]int, N)
	for i := range matrix {
		matrix[i] = make([]int, N)
	}

	// 1. Fast: Row-Major (Spatial Locality)
	start := time.Now()
	for i := 0; i < N; i++ {
		for j := 0; j < N; j++ {
			matrix[i][j] = 1
		}
	}
	fmt.Printf("Row-Major (Fast): %v\n", time.Since(start))

	// 2. Slow: Column-Major (Cache Thrashing)
	start = time.Now()
	for j := 0; j < N; j++ {
		for i := 0; i < N; i++ {
			matrix[i][j] = 1
		}
	}
	fmt.Printf("Col-Major (Slow): %v\n", time.Since(start))
}
public class Locality {
    static final int N = 5000;
    static int[][] matrix = new int[N][N];

    public static void main(String[] args) {
        // 1. Fast: Row-Major
        long start = System.nanoTime();
        for (int i = 0; i < N; i++) {
            for (int j = 0; j < N; j++) {
                matrix[i][j] = 1;
            }
        }
        long dur = System.nanoTime() - start;
        System.out.printf("Row-Major (Fast): %.2f ms%n", dur / 1e6);

        // 2. Slow: Column-Major
        start = System.nanoTime();
        for (int j = 0; j < N; j++) {
            for (int i = 0; i < N; i++) {
                matrix[i][j] = 1;
            }
        }
        dur = System.nanoTime() - start;
        System.out.printf("Col-Major (Slow): %.2f ms%n", dur / 1e6);
    }
}