Thrashing & Working Sets
[!NOTE] This module explores the core principles of Thrashing & Working Sets, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Spiral of Death: Thrashing
Imagine you have 16GB of RAM. You open a game (8GB), a browser (4GB), and a compiler (6GB). Total demand: 18GB.
The OS tries to satisfy everyone by Swapping pages to disk.
- CPU switches to Game. Game needs Page A.
- Page Fault. OS evicts Browser Page B. Loads Game Page A.
- CPU switches to Browser. Browser needs Page B.
- Page Fault. OS evicts Game Page A. Loads Browser Page B.
The CPU spends 99% of its time waiting for the Disk (I/O) and 1% executing instructions.
- CPU Utilization: Approaching 0%.
- Disk Utilization: Approaching 100%. This state is called Thrashing.
2. The Solution: Working Set Model
We cannot just look at “Total Memory Requested” (18GB). Processes don’t use all their memory at once.
- Initialization Phase: Uses setup code.
- Loop Phase: Uses specific data arrays.
The Working Set (W) of a process is the set of pages it is actively using at time t.
If the sum of all Working Sets is less than Physical RAM, the system is stable.
Σ W(Process_i) < RAM_Size
Locality of Reference
This model works because of Locality:
- Temporal Locality: If you access address
X, you will likely accessXagain soon (e.g., loop variables). - Spatial Locality: If you access address
X, you will likely accessX+1soon (e.g., array traversal).
3. Interactive: Thrashing Monitor
Simulate a system under load. Add processes and watch the metrics.
- Green: Working Set fits in RAM.
- Yellow: Swapping starts (Graceful degradation).
- Red: Thrashing (System collapse).
4. Code Example: Spatial Locality Benchmark
We can prove the existence of hardware caching and the importance of locality by traversing a 2D array in two different ways.
- Row-Major (Fast): Access memory sequentially (
a[0],a[1],a[2]). - Column-Major (Slow): Jump huge strides (
a[0],a[1000],a[2000]). This causes Cache Misses and potentially Page Faults.
package main
import (
"fmt"
"time"
)
const N = 5000
func main() {
// Allocate 5000x5000 int array (100MB)
matrix := make([][]int, N)
for i := range matrix {
matrix[i] = make([]int, N)
}
// 1. Fast: Row-Major (Spatial Locality)
start := time.Now()
for i := 0; i < N; i++ {
for j := 0; j < N; j++ {
matrix[i][j] = 1
}
}
fmt.Printf("Row-Major (Fast): %v\n", time.Since(start))
// 2. Slow: Column-Major (Cache Thrashing)
start = time.Now()
for j := 0; j < N; j++ {
for i := 0; i < N; i++ {
matrix[i][j] = 1
}
}
fmt.Printf("Col-Major (Slow): %v\n", time.Since(start))
}
public class Locality {
static final int N = 5000;
static int[][] matrix = new int[N][N];
public static void main(String[] args) {
// 1. Fast: Row-Major
long start = System.nanoTime();
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
matrix[i][j] = 1;
}
}
long dur = System.nanoTime() - start;
System.out.printf("Row-Major (Fast): %.2f ms%n", dur / 1e6);
// 2. Slow: Column-Major
start = System.nanoTime();
for (int j = 0; j < N; j++) {
for (int i = 0; i < N; i++) {
matrix[i][j] = 1;
}
}
dur = System.nanoTime() - start;
System.out.printf("Col-Major (Slow): %.2f ms%n", dur / 1e6);
}
}