Parallel Streams
One of the most powerful features of the Stream API is the ability to parallelize operations with a single method call: parallelStream(). This allows you to leverage multi-core processors without writing a single thread management line.
However, parallel streams are not a “magic turbo button.” Misusing them can lead to slower performance or, worse, incorrect results due to race conditions.
1. How It Works: The Fork/Join Framework
Under the hood, parallel streams use the Fork/Join Framework (introduced in Java 7).
- Splitting: The source data is recursively split into smaller chunks (using a
Spliterator). - Processing: Each chunk is processed by a thread in the
ForkJoinPool.commonPool(). - Work Stealing: If a thread finishes its task early, it “steals” work from other busy threads (from the tail of their deque) to maximize CPU utilization.
- Combining: Partial results are merged back together.
The Work Stealing Algorithm
// Sequential
long count = list.stream().filter(e -> e > 10).count();
// Parallel - Uses multiple threads!
long count = list.parallelStream().filter(e -> e > 10).count();
2. When to Use Parallel Streams? (The NQ Model)
Parallelism has overhead: splitting tasks, scheduling threads, and merging results. For small tasks, this overhead outweighs the benefit.
A good heuristic is the NQ Model:
- N: Number of elements.
- Q: Cost per element (CPU cycles).
You generally need N × Q > 10,000 to see a benefit.
| Scenario | Recommendation |
|---|---|
| Small List (N < 1000) | Sequential (Overhead > Benefit) |
Simple Op (e.g., sum) |
Sequential (Unless N is huge) |
| Heavy Op (e.g., encryption) | Parallel (Even for small N) |
| IO Bound (Network/Disk) | Avoid Parallel Stream (Blocks common pool) |
[!WARNING] Avoid Parallel Streams for I/O operations. Parallel Streams use the global
ForkJoinPool.commonPool(), which has a fixed number of threads (usuallyCPU Cores - 1). Blocking these threads with network calls can starve your entire application’s parallelism.
3. Interactive: The Race
Visualize the performance difference between Sequential and Parallel execution on different dataset sizes.
4. Thread Safety & Pitfalls
The most dangerous pitfall is side-effects. If your lambda expressions modify shared mutable state, your parallel stream will produce unpredictable results.
The “Race Condition” Trap
// BAD CODE: Race Condition!
List<Integer> numbers = IntStream.range(0, 1000).boxed().toList();
List<Integer> result = new ArrayList<>(); // Not thread-safe!
numbers.parallelStream().forEach(n -> {
result.add(n); // ConcurrentModificationException or lost updates
});
The Fix
Use thread-safe operations like collect() or reduce(), which handle synchronization for you.
// CORRECT: Use collect
List<Integer> result = numbers.parallelStream()
.collect(Collectors.toList()); // Safe and efficient
5. Comparisons: Java vs. Go Concurrency
Java focuses on data parallelism (splitting data). Go focuses on task parallelism (managing concurrent processes).
Java Parallel Stream
// Declarative: "Process this list in parallel"
long count = list.parallelStream()
.filter(this::expensiveOp)
.count();
// Java handles thread management automatically
Go Worker Pool
// Imperative: "Launch workers, send jobs"
jobs := make(chan int, 100)
results := make(chan int, 100)
// Start 4 workers
for w := 1; w <= 4; w++ {
go worker(jobs, results)
}
// Send work
for _, n := range list {
jobs <- n
}
close(jobs)
// Go requires explicit orchestration
Module Review
You’ve mastered the functional side of Java! Let’s review everything with some flashcards and a cheat sheet.