Union-Find — AlgoCliff

Step 01

What Is This Pattern?

A Union-Find (also called Disjoint Set Union) is a data structure that tracks a set of elements partitioned into non-overlapping groups. It supports two operations: find (which group does this element belong to?) and union (merge two groups into one). The magic: both operations run in nearly O(1) amortized time.

The Core Idea

Each element points to a parent. The root of each tree is the representative (or "leader") of that group. To check if two elements are connected, we find their roots. To merge groups, we make one root point to the other.

Before union(1, 4):

group A

0

1

2

group B

3

4

After union(1, 4):

merged

0

1

2

3

4

parent[] array representation:

0

1

0

2

0

3

0

4

3

Node 3's parent changed from 3 to 0, merging the two groups.

💡

Why does this work? Each group forms a tree rooted at its representative. The find operation walks up parent pointers to the root. Two elements are in the same group if and only if they share the same root. The union operation just connects one root to the other, merging the two trees into one.

Step 02

When to Use It

Look for these recognition signals in the problem statement. If you spot one, Union-Find is very likely the intended approach.

"Connected components" or "number of groups"

Count distinct groups after processing a series of connections or merges.

"Are X and Y connected?"

Determine whether two elements belong to the same group after a sequence of union operations.

"Redundant connection" or "cycle detection"

Find an edge that, when added, would create a cycle. If find(u) == find(v) before union, the edge is redundant.

"Merge accounts" or "equivalence classes"

Group items by transitive relationships: if A~B and B~C, then A, B, C are all in the same group.

⚡

The key clue: You need to dynamically merge groups and query connectivity. A brute-force adjacency traversal would be O(n) per query. Union-Find answers both "union" and "find" in nearly O(1) amortized time using two optimizations: path compression and union by rank.

Step 03

Variant 1 — Union by Rank

Balanced Trees

Without optimization, union can create tall, skinny trees (like a linked list), making find O(n). Union by rank fixes this by always attaching the shorter tree under the taller tree, keeping the tree height logarithmic.

Walkthrough: Union by Rank

We have 6 nodes {0..5}. Let us process union(0,1), union(2,3), union(0,2).

Initial

0

1

2

3

4

5

Every node is its own root. rank[] = [0,0,0,0,0,0]. 6 components.

Components: 6

union(0, 1)

0

1

0

2

3

4

5

Same rank, so 1's root attaches under 0. rank[0] becomes 1. Components: 5.

0

1

union(2, 3)

0

1

0

2

3

2

4

5

Same rank, so 3 attaches under 2. rank[2] becomes 1. Components: 4.

0

1

2

3

union(0, 2)

0

1

0

2

0

3

2

4

5

Both roots (0 and 2) have rank 1 -- same rank. Attach 2 under 0. rank[0] becomes 2. Components: 3. The tree stays balanced.

0

1

2

3

🎯

Why rank matters: Without union by rank, a sequence of unions can produce a degenerate chain of length n, making find O(n). With union by rank, the tree height is bounded by O(log n), because a tree of rank k has at least 2^k nodes.

Step 04

Variant 2 — Path Compression

Flattened Trees

Path compression makes every node on the find path point directly to the root. This flattens the tree, so future find calls on the same nodes are O(1). Combined with union by rank, this gives the optimal inverse Ackermann amortized complexity.

Example: find(5) with Path Compression

Before find(5), the parent chain is: 5 → 4 → 2 → 0 (root). After path compression, every visited node points directly to 0.

Before find(5):

0

1

0

2

0

3

2

4

2

5

4

            Chain: 5 → 4 → 2 → 0 (root)
          

After find(5):

0

1

0

2

0

3

2

4

0

5

0

            Now: 5 → 0, 4 → 0 (flat!)
          

🎯

The two optimizations together: Union by rank keeps trees shallow (O(log n) height). Path compression flattens paths during find. Together, they achieve O(α(n)) amortized time per operation, where α is the inverse Ackermann function -- effectively constant for all practical inputs (up to 2^65536 elements).

Step 05

Template Code

Here is the annotated Java template with both optimizations: path compression in find, union by rank in union.

Union-Find with Path Compression + Union by Rank

class UnionFind {
    int[] parent;
    int[] rank;
    int components;

    UnionFind(int n) {
        parent = new int[n];
        rank = new int[n];
        components = n;
        for (int i = 0; i < n; i++) {
            parent[i] = i;         // each node is its own root
            rank[i] = 0;           // initial rank is 0
        }
    }

    int find(int x) {
        if (parent[x] != x) {
            parent[x] = find(parent[x]); // path compression
        }
        return parent[x];
    }

    boolean union(int x, int y) {
        int rx = find(x), ry = find(y);
        if (rx == ry) return false;  // already connected

        // Union by rank: attach smaller tree under larger
        if (rank[rx] < rank[ry]) {
            parent[rx] = ry;
        } else if (rank[rx] > rank[ry]) {
            parent[ry] = rx;
        } else {
            parent[ry] = rx;
            rank[rx]++;             // only increase when ranks are equal
        }
        components--;
        return true;               // successfully merged
    }

    boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

Cycle Detection Pattern

// Detect if adding edge (u, v) creates a cycle
UnionFind uf = new UnionFind(n);
for (int[] edge : edges) {
    if (!uf.union(edge[0], edge[1])) {
        // edge is redundant -- it creates a cycle
        return edge;
    }
}

📝

Return value of union. Making union return a boolean (true if merge happened, false if already connected) is extremely useful. It lets you detect cycles, count components, and track redundant edges without extra bookkeeping.

Step 06

Visual Walkthrough

Let us trace through a series of union and find operations on 7 nodes, showing the parent array and connected components at every stage.

Nodes: {0, 1, 2, 3, 4, 5, 6}

Operations: union(0,1), union(2,3), union(4,5), union(0,2), union(4,6), find(3), union(0,4)

OPERATION	ACTION	PARENT[]	COMPONENTS
init	Each node is its own root	[0,1,2,3,4,5,6]	7
union(0,1)	Attach 1 under 0	[0,0,2,3,4,5,6]	6
union(2,3)	Attach 3 under 2	[0,0,2,2,4,5,6]	5
union(4,5)	Attach 5 under 4	[0,0,2,2,4,4,6]	4
union(0,2)	Roots 0,2 same rank; attach 2 under 0	[0,0,0,2,4,4,6]	3
union(4,6)	Attach 6 under 4	[0,0,0,2,4,4,4]	2
find(3)	3→2→0. Path compress: parent[3]=0	[0,0,0,0,4,4,4]	2
union(0,4)	rank[0]=2 > rank[4]=1; attach 4 under 0	[0,0,0,0,0,4,4]	1

FINAL STATE

all connected

0

1

2

3

4

5

6

💡

Notice the flattening: After find(3), node 3 points directly to root 0 instead of through node 2. Future find(3) calls are now O(1). This is path compression at work, progressively flattening the tree with each find operation.

Step 07

Common Problems

These are the classic LeetCode problems that use the Union-Find pattern, listed in rough order of difficulty.

#200 Number of Islands Medium

#547 Number of Provinces Medium

#684 Redundant Connection Medium

#721 Accounts Merge Medium

#990 Satisfiability of Equality Equations Medium

#1319 Number of Operations to Make Network Connected Medium

💡

Practice tip: Start with #547 (direct application of the template -- count components in an adjacency matrix) and #684 (cycle detection). Once comfortable, tackle #721 which requires mapping strings to indices and collecting results by component, and #990 which cleverly models equation constraints as union operations.

Step 08

Interactive Demo

Enter edges to build a graph and watch Union-Find process them. See the parent array update, path compression flatten trees, and connected components merge in real time.

⚙ Union-Find Visualizer

Nodes

Edges (u,v pairs)

PARENT ARRAY

RANK ARRAY

CONNECTED COMPONENTS

INFO

GROUPS
-

Press "Step →" or "Run All" to begin.

Step 09

Complexity Analysis

Time (per op)

O(α(n))

Space

O(n)

Why Nearly O(1)?

With path compression and union by rank combined, each find and union operation takes O(α(n)) amortized time, where α is the inverse Ackermann function. For any practical value of n (up to 2^65536), α(n) ≤ 4. This is effectively constant.

OPTIMIZATION	FIND	UNION
None (naive)	O(n)	O(n)
Union by rank only	O(log n)	O(log n)
Path compression only	O(log n) amortized	O(log n) amortized
Both (optimal)	O(α(n))	O(α(n))

Space: The parent and rank arrays each require n elements. Total: O(n).

⚡

For m operations on n elements: The total time is O(m · α(n)), which is effectively O(m) in practice. This makes Union-Find one of the most efficient data structures for dynamic connectivity problems, far outperforming BFS/DFS-based approaches when queries are interleaved with modifications.

Step 10

Key Takeaways

01

Two arrays: parent[] and rank[]. The parent array tracks the tree structure. The rank array tracks tree height bounds. Together they form the complete Union-Find data structure. Initialize parent[i] = i and rank[i] = 0.

02

Path compression in find, union by rank in union. These two optimizations are what turn an O(n) naive structure into an effectively O(1) amortized one. Always use both together in interviews.

03

Union returns a boolean for cycle detection. If find(x) == find(y) before union, the elements are already connected. Adding this edge would create a cycle. This pattern appears in redundant connection and MST problems.

04

Track the component count. Start with n components. Each successful union decreases the count by 1. This gives you the number of connected components at any point without extra traversal.

05

Extends to grids and string equivalence. For grid problems like Number of Islands, treat each cell as a node and union adjacent land cells. For equivalence problems like Accounts Merge, map strings to indices and union them. The Union-Find abstraction works anywhere you have transitive connectivity.

Union-Find (Disjoint Set)

Roadmap