Tracking Connected Components

Note

This module explores the core principles of Union-Find (Disjoint Set Union), deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. Concept Definition

Problem: Given a set of elements, we want to:

  1. Union(A, B): Connect element A and element B.
  2. Find(A): Determine which group (component) A belongs to.

Edge Cases & Clarifying Questions

  • What if the elements are already connected? Union should return false or do nothing, which is often used for cycle detection.
  • Are node IDs 0-indexed or 1-indexed? This affects the initialization of the parent array. We assume 0-indexed for simplicity.
  • What if we union an element with itself? It should have no effect, as both Find calls return the same root.

Applications: Kruskal’s MST Algorithm, Cycle Detection in Undirected Graphs, Grid Connectivity (Number of Islands).

Optimization:

  • Path Compression: Point nodes directly to the root during find.
  • Union by Rank/Size: Attach the smaller tree to the larger tree to keep height low.
  • Result: Time complexity becomes O(alpha(N)) (Inverse Ackermann function), which is effectively O(1) for all practical N.

2. Intuition (The “Genesis”)

At its core, a Disjoint Set is a forest (a collection of trees). When we start, every element is its own tree with itself as the root.

Brute Force Approach

  • Array approach: We could use an array where arr[i] stores the component ID of node i. Find is O(1), but Union(A, B) requires changing the component ID for all nodes in A’s component to B’s component, taking O(N) time.
  • Tree approach (Naive): We use an array parent where parent[i] is the parent of node i. Union(A, B) just sets parent[root(A)] = root(B). But trees can become linear chains, making Find O(N).

Optimized Approach (The Pattern)

To achieve near O(1) performance, we apply two simple but powerful optimizations:

  1. Path Compression (during Find): Every time we traverse from a node to the root, we update the parent pointer of all nodes along the path to point directly to the root. This flattens the tree dynamically.
  2. Union by Rank / Size (during Union): When merging two trees, always attach the shorter/smaller tree under the root of the taller/larger tree. This minimizes the maximum depth.

Common Pitfalls

  • Forgetting Path Compression: If you just do parent[i] = find(parent[i]) but don’t return it, or forget it entirely, Find degrades to O(N).
  • Unioning elements instead of roots: You must connect Find(A) to Find(B). Doing parent[A] = parent[B] breaks the tree structure!
  • Path Compression without recursion: While possible iteratively, the recursive one-liner parent[i] = Find(parent[i]) is much cleaner and less error-prone.

3. Interactive Analysis

Union nodes and watch them merge. Find highlights the path to the representative (Root).

Parents: [0, 1, 2, 3, 4, 5]

4. Implementation

Java
Go
class DSU {
  int[] parent;
  int[] rank;

  public DSU(int n) {
    parent = new int[n];
    rank = new int[n];
    for (int i = 0; i < n; i++) parent[i] = i;
  }

  public int find(int i) {
    if (parent[i] != i) {
      parent[i] = find(parent[i]); // Path Compression
    }
    return parent[i];
  }

  public boolean union(int i, int j) {
    int rootI = find(i);
    int rootJ = find(j);
    if (rootI != rootJ) {
      // Union by Rank
      if (rank[rootI] > rank[rootJ]) {
        parent[rootJ] = rootI;
      } else if (rank[rootI] < rank[rootJ]) {
        parent[rootI] = rootJ;
      } else {
        parent[rootJ] = rootI;
        rank[rootI]++;
      }
      return true;
    }
    return false;
  }
}
type DSU struct {
  parent []int
  rank   []int
}

func NewDSU(n int) *DSU {
  p := make([]int, n)
  for i := range p { p[i] = i }
  return &DSU{parent: p, rank: make([]int, n)}
}

func (d *DSU) Find(i int) int {
  if d.parent[i] != i {
    d.parent[i] = d.Find(d.parent[i]) // Path Compression
  }
  return d.parent[i]
}

func (d *DSU) Union(i, j int) bool {
  rootI := d.Find(i)
  rootJ := d.Find(j)
  if rootI != rootJ {
    // Union by Rank
    if d.rank[rootI] > d.rank[rootJ] {
      d.parent[rootJ] = rootI
    } else if d.rank[rootI] < d.rank[rootJ] {
      d.parent[rootI] = rootJ
    } else {
      d.parent[rootJ] = rootI
      d.rank[rootI]++
    }
    return true
  }
  return false
}

5. Complexity Analysis

Operation Time Space Notes
Union O(α(N)) O(N) Nearly constant time.
Find O(α(N)) O(N) Amortized.

α(N) is the Inverse Ackermann function. For all practical values of N (even universe size), α(N) ≤ 4.