Fork/Join Framework

Imagine you have a massive pile of 10,000 unsorted documents to organize. You could do it all yourself (sequential processing), or you could split the pile in half, give one half to a friend, and keep splitting until everyone has a manageable stack. Once everyone is done, you merge the sorted stacks back together. This is the essence of the Fork/Join framework.

In this chapter, we’ll explore how Java’s ForkJoinPool leverages multi-core processors to solve complex problems efficiently using the work-stealing algorithm.

1. The Fork/Join Framework

Introduced in Java 7, the ForkJoinPool is a specialized implementation of ExecutorService designed for tasks that can be broken down into smaller pieces recursively.

Core Components

  1. ForkJoinPool: The engine that manages worker threads and executes tasks.
  2. ForkJoinTask: The base class for tasks running within the pool.
    • **RecursiveTask**: A task that returns a result.
    • RecursiveAction: A task that does not return a result (side-effect only).

2. The Algorithm: Work-Stealing

The secret sauce of ForkJoinPool is Work-Stealing.

  • Standard Thread Pools: Typically use a single shared queue for tasks. Threads contend for the lock on this queue.
  • ForkJoinPool: Each worker thread has its own double-ended queue (deque).
  • Threads push new tasks to the head of their own deque.
  • Threads pop tasks from the head of their own deque (LIFO - Last In, First Out).
  • When a thread is empty, it steals a task from the tail of another thread’s deque (FIFO - First In, First Out).

This minimizes contention because threads operate on opposite ends of the deques!

Interactive: Work-Stealing Simulator

Visualize how idle threads steal work from busy threads to keep the CPU fully utilized.

Work-Stealing Simulator

3. Code Implementation

Let’s implement a Merge Sort algorithm using the Fork/Join framework. Merge Sort is a classic “divide and conquer” algorithm perfectly suited for recursive task decomposition.

Java Implementation

We extend RecursiveAction because sorting an array happens in-place (no return value needed). If we were summing numbers, we’d use RecursiveTask<Integer>.

import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveAction;
import java.util.Arrays;

public class ParallelMergeSort {

    private static class MergeSortTask extends RecursiveAction {
        private final int[] array;
        private final int left;
        private final int right;
        private static final int THRESHOLD = 100; // Granularity control

        public MergeSortTask(int[] array, int left, int right) {
            this.array = array;
            this.left = left;
            this.right = right;
        }

        @Override
        protected void compute() {
            if (right - left < THRESHOLD) {
                // Base case: Sort small chunk sequentially
                Arrays.sort(array, left, right + 1);
            } else {
                // Recursive case: Split task
                int mid = (left + right) / 2;
                MergeSortTask leftTask = new MergeSortTask(array, left, mid);
                MergeSortTask rightTask = new MergeSortTask(array, mid + 1, right);

                // Fork: Push tasks to deque
                invokeAll(leftTask, rightTask);

                // Join: Wait for completion (implicit in invokeAll)
                merge(mid);
            }
        }

        private void merge(int mid) {
            // Standard merge logic combining sorted halves
            int[] temp = new int[right - left + 1];
            int i = left, j = mid + 1, k = 0;
            while (i <= mid && j <= right) {
                if (array[i] <= array[j]) temp[k++] = array[i++];
                else temp[k++] = array[j++];
            }
            while (i <= mid) temp[k++] = array[i++];
            while (j <= right) temp[k++] = array[j++];
            System.arraycopy(temp, 0, array, left, temp.length);
        }
    }

    public static void main(String[] args) {
        int[] data = new int[10000];
        // fill random...

        // Use common pool
        ForkJoinPool pool = ForkJoinPool.commonPool();
        pool.invoke(new MergeSortTask(data, 0, data.length - 1));
    }
}

[!TIP] Threshold Matters: Setting the threshold too low causes overhead from creating too many small objects. Setting it too high reduces parallelism. It requires tuning.

Go Implementation

Go doesn’t have a direct ForkJoinPool equivalent because its Goroutines are already lightweight and managed by a work-stealing runtime scheduler. However, we can simulate the “Divide and Conquer” pattern using sync.WaitGroup.

package main

import (
	"fmt"
	"sort"
	"sync"
)

const threshold = 100

func parallelMergeSort(arr []int) {
	if len(arr) < threshold {
		sort.Ints(arr)
		return
	}

	mid := len(arr) / 2
	var wg sync.WaitGroup
	wg.Add(2)

	// Fork Left: Spawn a goroutine
	go func() {
		defer wg.Done()
		parallelMergeSort(arr[:mid])
	}()

	// Fork Right: Spawn a goroutine
	go func() {
		defer wg.Done()
		parallelMergeSort(arr[mid:])
	}()

	// Join: Wait for both halves
	wg.Wait()
	merge(arr, mid)
}

func merge(arr []int, mid int) {
	// Standard merge logic
	left := make([]int, mid)
	right := make([]int, len(arr)-mid)
	copy(left, arr[:mid])
	copy(right, arr[mid:])

	i, j, k := 0, 0, 0
	for i < len(left) && j < len(right) {
		if left[i] <= right[j] {
			arr[k] = left[i]
			i++
		} else {
			arr[k] = right[j]
			j++
		}
		k++
	}
	for i < len(left) {
		arr[k] = left[i]
		i++
		k++
	}
	for j < len(right) {
		arr[k] = right[j]
		j++
		k++
	}
}

func main() {
	data := []int{9, 3, 7, 1, 5, 2, 8, 4, 6, 0}
	parallelMergeSort(data)
	fmt.Println("Sorted:", data)
}

4. Key Differences

Feature Java ForkJoinPool Go Runtime
Abstraction Explicit API (RecursiveTask) Implicit in Language (go keyword)
Control Fine-grained (custom pools) Runtime managed (GOMAXPROCS)
State Shared memory (careful with mutation) Shared memory or Channels
Philosophy “Framework for parallelism” “Concurrency is built-in”

[!NOTE] In Go, the runtime scheduler is a work-stealing scheduler. When a generic goroutine (P) runs out of work, it attempts to steal goroutines from other Ps, exactly like Java’s ForkJoinPool.

5. When to Use Fork/Join?

  1. Recursive Problems: Matrix multiplication, sorting, tree traversal.
  2. Processor Intensive: Tasks that are CPU-bound, not IO-bound.
  3. Independent Subtasks: Tasks should not block waiting for each other (except at the join point).

6. Summary

The Fork/Join framework provides a structured way to parallelize recursive algorithms. By using Work-Stealing, it ensures that all CPU cores remain busy even if task sizes are uneven. While Java requires explicit setup, Go’s runtime handles similar scheduling dynamics automatically for all goroutines.