Tracking Connected Components
This module explores the core principles of Union-Find (Disjoint Set Union), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Concept Definition
Problem: Given a set of elements, we want to:
- Union(A, B): Connect element A and element B.
- Find(A): Determine which group (component) A belongs to.
Edge Cases & Clarifying Questions
- What if the elements are already connected?
Unionshould returnfalseor do nothing, which is often used for cycle detection. - Are node IDs 0-indexed or 1-indexed? This affects the initialization of the parent array. We assume 0-indexed for simplicity.
- What if we union an element with itself? It should have no effect, as both
Findcalls return the same root.
Applications: Kruskal’s MST Algorithm, Cycle Detection in Undirected Graphs, Grid Connectivity (Number of Islands).
Optimization:
- Path Compression: Point nodes directly to the root during
find. - Union by Rank/Size: Attach the smaller tree to the larger tree to keep height low.
- Result: Time complexity becomes O(alpha(N)) (Inverse Ackermann function), which is effectively O(1) for all practical N.
2. Intuition (The “Genesis”)
At its core, a Disjoint Set is a forest (a collection of trees). When we start, every element is its own tree with itself as the root.
Brute Force Approach
- Array approach: We could use an array where
arr[i]stores the component ID of nodei.Findis O(1), butUnion(A, B)requires changing the component ID for all nodes in A’s component to B’s component, taking O(N) time. - Tree approach (Naive): We use an array
parentwhereparent[i]is the parent of nodei.Union(A, B)just setsparent[root(A)] = root(B). But trees can become linear chains, makingFindO(N).
Optimized Approach (The Pattern)
To achieve near O(1) performance, we apply two simple but powerful optimizations:
- Path Compression (during Find): Every time we traverse from a node to the root, we update the parent pointer of all nodes along the path to point directly to the root. This flattens the tree dynamically.
- Union by Rank / Size (during Union): When merging two trees, always attach the shorter/smaller tree under the root of the taller/larger tree. This minimizes the maximum depth.
Common Pitfalls
- Forgetting Path Compression: If you just do
parent[i] = find(parent[i])but don’t return it, or forget it entirely,Finddegrades to O(N). - Unioning elements instead of roots: You must connect
Find(A)toFind(B). Doingparent[A] = parent[B]breaks the tree structure! - Path Compression without recursion: While possible iteratively, the recursive one-liner
parent[i] = Find(parent[i])is much cleaner and less error-prone.
3. Interactive Analysis
Union nodes and watch them merge. Find highlights the path to the representative (Root).
4. Implementation
class DSU {
int[] parent;
int[] rank;
public DSU(int n) {
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) parent[i] = i;
}
public int find(int i) {
if (parent[i] != i) {
parent[i] = find(parent[i]); // Path Compression
}
return parent[i];
}
public boolean union(int i, int j) {
int rootI = find(i);
int rootJ = find(j);
if (rootI != rootJ) {
// Union by Rank
if (rank[rootI] > rank[rootJ]) {
parent[rootJ] = rootI;
} else if (rank[rootI] < rank[rootJ]) {
parent[rootI] = rootJ;
} else {
parent[rootJ] = rootI;
rank[rootI]++;
}
return true;
}
return false;
}
}
type DSU struct {
parent []int
rank []int
}
func NewDSU(n int) *DSU {
p := make([]int, n)
for i := range p { p[i] = i }
return &DSU{parent: p, rank: make([]int, n)}
}
func (d *DSU) Find(i int) int {
if d.parent[i] != i {
d.parent[i] = d.Find(d.parent[i]) // Path Compression
}
return d.parent[i]
}
func (d *DSU) Union(i, j int) bool {
rootI := d.Find(i)
rootJ := d.Find(j)
if rootI != rootJ {
// Union by Rank
if d.rank[rootI] > d.rank[rootJ] {
d.parent[rootJ] = rootI
} else if d.rank[rootI] < d.rank[rootJ] {
d.parent[rootI] = rootJ
} else {
d.parent[rootJ] = rootI
d.rank[rootI]++
}
return true
}
return false
}
5. Complexity Analysis
| Operation | Time | Space | Notes |
|---|---|---|---|
| Union | O(α(N)) | O(N) | Nearly constant time. |
| Find | O(α(N)) | O(N) | Amortized. |
α(N) is the Inverse Ackermann function. For all practical values of N (even universe size), α(N) ≤ 4.