Track connected components efficiently with near-constant time union and find operations using path compression and union by rank.
A Union-Find (also called Disjoint Set Union) is a data structure that tracks a set of elements partitioned into non-overlapping groups. It supports two operations: find (which group does this element belong to?) and union (merge two groups into one). The magic: both operations run in nearly O(1) amortized time.
Each element points to a parent. The root of each tree is the representative (or "leader") of that group. To check if two elements are connected, we find their roots. To merge groups, we make one root point to the other.
Why does this work? Each group forms a tree rooted at its representative. The find operation walks up parent pointers to the root. Two elements are in the same group if and only if they share the same root. The union operation just connects one root to the other, merging the two trees into one.
Look for these recognition signals in the problem statement. If you spot one, Union-Find is very likely the intended approach.
"Connected components" or "number of groups"
"Are X and Y connected?"
"Redundant connection" or "cycle detection"
"Merge accounts" or "equivalence classes"
The key clue: You need to dynamically merge groups and query connectivity. A brute-force adjacency traversal would be O(n) per query. Union-Find answers both "union" and "find" in nearly O(1) amortized time using two optimizations: path compression and union by rank.
Without optimization, union can create tall, skinny trees (like a linked list), making find O(n). Union by rank fixes this by always attaching the shorter tree under the taller tree, keeping the tree height logarithmic.
We have 6 nodes {0..5}. Let us process union(0,1), union(2,3), union(0,2).
Why rank matters: Without union by rank, a sequence of unions can produce a degenerate chain of length n, making find O(n). With union by rank, the tree height is bounded by O(log n), because a tree of rank k has at least 2^k nodes.
Path compression makes every node on the find path point directly to the root. This flattens the tree, so future find calls on the same nodes are O(1). Combined with union by rank, this gives the optimal inverse Ackermann amortized complexity.
Before find(5), the parent chain is: 5 → 4 → 2 → 0 (root). After path compression, every visited node points directly to 0.
The two optimizations together: Union by rank keeps trees shallow (O(log n) height). Path compression flattens paths during find. Together, they achieve O(α(n)) amortized time per operation, where α is the inverse Ackermann function -- effectively constant for all practical inputs (up to 2^65536 elements).
Here is the annotated Java template with both optimizations: path compression in find, union by rank in union.
class UnionFind { int[] parent; int[] rank; int components; UnionFind(int n) { parent = new int[n]; rank = new int[n]; components = n; for (int i = 0; i < n; i++) { parent[i] = i; // each node is its own root rank[i] = 0; // initial rank is 0 } } int find(int x) { if (parent[x] != x) { parent[x] = find(parent[x]); // path compression } return parent[x]; } boolean union(int x, int y) { int rx = find(x), ry = find(y); if (rx == ry) return false; // already connected // Union by rank: attach smaller tree under larger if (rank[rx] < rank[ry]) { parent[rx] = ry; } else if (rank[rx] > rank[ry]) { parent[ry] = rx; } else { parent[ry] = rx; rank[rx]++; // only increase when ranks are equal } components--; return true; // successfully merged } boolean connected(int x, int y) { return find(x) == find(y); } }
// Detect if adding edge (u, v) creates a cycle UnionFind uf = new UnionFind(n); for (int[] edge : edges) { if (!uf.union(edge[0], edge[1])) { // edge is redundant -- it creates a cycle return edge; } }
Return value of union. Making union return a boolean (true if merge happened, false if already connected) is extremely useful. It lets you detect cycles, count components, and track redundant edges without extra bookkeeping.
Let us trace through a series of union and find operations on 7 nodes, showing the parent array and connected components at every stage.
Operations: union(0,1), union(2,3), union(4,5), union(0,2), union(4,6), find(3), union(0,4)
| OPERATION | ACTION | PARENT[] | COMPONENTS |
|---|---|---|---|
| init | Each node is its own root | [0,1,2,3,4,5,6] | 7 |
| union(0,1) | Attach 1 under 0 | [0,0,2,3,4,5,6] | 6 |
| union(2,3) | Attach 3 under 2 | [0,0,2,2,4,5,6] | 5 |
| union(4,5) | Attach 5 under 4 | [0,0,2,2,4,4,6] | 4 |
| union(0,2) | Roots 0,2 same rank; attach 2 under 0 | [0,0,0,2,4,4,6] | 3 |
| union(4,6) | Attach 6 under 4 | [0,0,0,2,4,4,4] | 2 |
| find(3) | 3→2→0. Path compress: parent[3]=0 | [0,0,0,0,4,4,4] | 2 |
| union(0,4) | rank[0]=2 > rank[4]=1; attach 4 under 0 | [0,0,0,0,0,4,4] | 1 |
Notice the flattening: After find(3), node 3 points directly to root 0 instead of through node 2. Future find(3) calls are now O(1). This is path compression at work, progressively flattening the tree with each find operation.
These are the classic LeetCode problems that use the Union-Find pattern, listed in rough order of difficulty.
Practice tip: Start with #547 (direct application of the template -- count components in an adjacency matrix) and #684 (cycle detection). Once comfortable, tackle #721 which requires mapping strings to indices and collecting results by component, and #990 which cleverly models equation constraints as union operations.
Enter edges to build a graph and watch Union-Find process them. See the parent array update, path compression flatten trees, and connected components merge in real time.
With path compression and union by rank combined, each find and union operation takes O(α(n)) amortized time, where α is the inverse Ackermann function. For any practical value of n (up to 2^65536), α(n) ≤ 4. This is effectively constant.
| OPTIMIZATION | FIND | UNION |
|---|---|---|
| None (naive) | O(n) | O(n) |
| Union by rank only | O(log n) | O(log n) |
| Path compression only | O(log n) amortized | O(log n) amortized |
| Both (optimal) | O(α(n)) | O(α(n)) |
Space: The parent and rank arrays each require n elements. Total: O(n).
For m operations on n elements: The total time is O(m · α(n)), which is effectively O(m) in practice. This makes Union-Find one of the most efficient data structures for dynamic connectivity problems, far outperforming BFS/DFS-based approaches when queries are interleaved with modifications.
Two arrays: parent[] and rank[]. The parent array tracks the tree structure. The rank array tracks tree height bounds. Together they form the complete Union-Find data structure. Initialize parent[i] = i and rank[i] = 0.
Path compression in find, union by rank in union. These two optimizations are what turn an O(n) naive structure into an effectively O(1) amortized one. Always use both together in interviews.
Union returns a boolean for cycle detection. If find(x) == find(y) before union, the elements are already connected. Adding this edge would create a cycle. This pattern appears in redundant connection and MST problems.
Track the component count. Start with n components. Each successful union decreases the count by 1. This gives you the number of connected components at any point without extra traversal.
Extends to grids and string equivalence. For grid problems like Number of Islands, treat each cell as a node and union adjacent land cells. For equivalence problems like Accounts Merge, map strings to indices and union them. The Union-Find abstraction works anywhere you have transitive connectivity.