String Matching

Step 01

What Is This Pattern?

String matching is the problem of finding all occurrences of a short pattern string inside a longer text string. The naive approach slides the pattern one character at a time and compares every character, giving O(n*m) time. Efficient algorithms like KMP and Rabin-Karp reduce this to O(n + m) by reusing information from previous comparisons.

The Core Idea

Align the pattern under the text and compare character by character. When a mismatch occurs, instead of restarting from scratch, use precomputed information to skip ahead intelligently.

Naive: mismatch at position 3, restart from position 1

text:

A

B

A

B

A

C

pat:

A

B

A

C

KMP: mismatch at position 3, but "ABA" has a prefix "A" that matches a suffix — shift pattern so that prefix aligns

text:

A

B

A

B

A

C

pat:

A

B

A

C

Skipped 2 positions instantly using the failure function.

💡

Why does this matter? In the brute force, after a mismatch you throw away everything you learned about the text. KMP and Rabin-Karp ensure you never re-examine a text character (KMP) or compare entire substrings in O(1) (Rabin-Karp). This is the difference between O(n*m) and O(n+m).

Step 02

When to Use It

Look for these recognition signals in the problem statement. If you spot one, a string matching algorithm is very likely the intended approach.

"Find the first occurrence" or "indexOf"

The classic substring search. Return the starting index where the pattern first appears in the text.

"Repeated pattern" or "periodic string"

Check if a string is composed of a repeating unit. The KMP failure function reveals the period.

"Longest prefix which is also a suffix"

Directly computed by the KMP failure function. Many problems reduce to this.

"Subtree matching" or "subarray matching"

Serialize structures as strings, then use string matching to find occurrences. Converts tree problems to string problems.

⚡

The key clue: You need to check if one string (or serialized structure) appears inside another. A brute-force nested loop would be O(n*m). KMP solves it in O(n+m) by precomputing a failure function. Rabin-Karp achieves the same average-case with a rolling hash, and is easier to extend to multi-pattern or 2D matching.

Step 03

Variant 1 — KMP (Knuth-Morris-Pratt)

Failure Function

KMP precomputes a failure function (also called the "partial match table" or "prefix function") for the pattern. For each position j in the pattern, fail[j] stores the length of the longest proper prefix of pattern[0..j] that is also a suffix. When a mismatch occurs at position j, we jump to fail[j-1] instead of restarting at 0.

Building the Failure Function for "ABABC"

We compare the pattern against itself to find prefix-suffix overlaps.

          j=0: "A" — no proper prefix/suffix. fail[0] = 0
        
          j=1: "AB" — "A" ≠ "B". fail[1] = 0
        
          j=2: "ABA" — prefix "A" matches suffix "A". fail[2] = 1
        
          j=3: "ABAB" — prefix "AB" matches suffix "AB". fail[3] = 2
        
          j=4: "ABABC" — no prefix matches suffix. fail[4] = 0

pattern:

A

B

A

B

C

fail[]:

0

1

2

0

🎯

Why this works: When text[i] mismatches pattern[j], we know that pattern[0..j-1] matched the text. The failure function tells us the longest prefix of the pattern that is also a suffix of what we already matched — so we can jump j = fail[j-1] and continue without re-scanning any text character.

Step 04

Variant 2 — Rabin-Karp (Rolling Hash)

Rolling Hash

Instead of comparing characters one by one, Rabin-Karp computes a hash of the pattern and a rolling hash of each text window. If the hashes match, we verify character by character (to handle collisions). The rolling hash updates in O(1) as the window slides.

Example: text = "ABCABD", pattern = "ABD"

Hash of "ABD" = h_p. Slide a window of length 3 across the text.

          i=0: hash("ABC") ≠ hp. No match. Slide.
        
          i=1: hash("BCA") ≠ hp. Rolling update: remove 'A', add 'A'. Slide.
        
          i=2: hash("CAB") ≠ hp. Slide.
        
          i=3: hash("ABD") = hp. Hash match! Verify: "ABD" == "ABD". Found at index 3.

        Rolling hash formula:

        hash(s[i..i+m-1]) = (s[i]·Bm-1 + s[i+1]·Bm-2 + ... + s[i+m-1]) mod P

        Update: newHash = (oldHash - s[i]·Bm-1) · B + s[i+m]

💡

When to choose Rabin-Karp over KMP: Rabin-Karp shines when you need to search for multiple patterns simultaneously (each has its own hash), when doing 2D pattern matching, or when the problem involves finding duplicate substrings (binary search on length + rolling hash). KMP is preferred for single-pattern search with guaranteed O(n+m) worst case.

Step 05

Template Code

Here are the annotated Java templates for both KMP and Rabin-Karp. KMP is the go-to for interview problems that need guaranteed linear time.

KMP — Build Failure Function

// Build the failure (prefix) function for pattern
int[] buildFail(String pat) {
    int m = pat.length();
    int[] fail = new int[m];
    int len = 0;                       // length of prev longest prefix-suffix
    int i = 1;

    while (i < m) {
        if (pat.charAt(i) == pat.charAt(len)) {
            len++;
            fail[i] = len;
            i++;
        } else {
            if (len > 0) {
                len = fail[len - 1];    // fall back, don't increment i
            } else {
                fail[i] = 0;
                i++;
            }
        }
    }
    return fail;
}

KMP — Search

// Return first index where pattern appears in text, or -1
int kmpSearch(String text, String pat) {
    int[] fail = buildFail(pat);
    int i = 0, j = 0;                 // i for text, j for pattern

    while (i < text.length()) {
        if (text.charAt(i) == pat.charAt(j)) {
            i++; j++;
            if (j == pat.length()) {
                return i - j;            // match found
            }
        } else {
            if (j > 0) {
                j = fail[j - 1];        // use failure function
            } else {
                i++;                     // no fallback, advance text
            }
        }
    }
    return -1;                          // no match
}

Rabin-Karp — Rolling Hash

// Rabin-Karp: average O(n+m), worst O(n*m) on hash collisions
int rabinKarp(String text, String pat) {
    int n = text.length(), m = pat.length();
    long B = 31, MOD = 1_000_000_007L;  // base and modulus
    long patHash = 0, winHash = 0, power = 1;

    for (int i = m - 1; i >= 0; i--) {  // compute hash + highest power
        patHash = (patHash + pat.charAt(i) * power) % MOD;
        winHash = (winHash + text.charAt(i) * power) % MOD;
        if (i > 0) power = power * B % MOD;
    }

    for (int i = 0; i <= n - m; i++) {
        if (winHash == patHash && text.substring(i, i + m).equals(pat))
            return i;                    // verify on hash match
        if (i < n - m) {               // roll the hash forward
            winHash = (winHash - text.charAt(i) * power % MOD + MOD) % MOD;
            winHash = (winHash * B + text.charAt(i + m)) % MOD;
        }
    }
    return -1;
}

📝

KMP guarantees O(n+m) worst case. Rabin-Karp is O(n+m) on average but can degrade to O(n*m) with hash collisions. In interviews, KMP is the safer choice for single-pattern search. Use Rabin-Karp when you need hashing flexibility (multi-pattern, duplicate substrings).

Step 06

Visual Walkthrough

Let us trace KMP on text = "ABABABABC" and pattern = "ABABC", showing how the failure function prevents re-scanning text characters.

KMP Search: "ABABABABC" vs "ABABC"

Failure function: fail = [0, 0, 1, 2, 0]

STEP	COMPARE	ACTION	i, j
1	text[0]='A' == pat[0]='A'	Match. i++, j++	1, 1
2	text[1]='B' == pat[1]='B'	Match. i++, j++	2, 2
3	text[2]='A' == pat[2]='A'	Match. i++, j++	3, 3
4	text[3]='B' == pat[3]='B'	Match. i++, j++	4, 4
5	text[4]='A' ≠ pat[4]='C'	Mismatch! j = fail[3] = 2	4, 2
6	text[4]='A' == pat[2]='A'	Match. i++, j++	5, 3
7	text[5]='B' == pat[3]='B'	Match. i++, j++	6, 4
8	text[6]='A' ≠ pat[4]='C'	Mismatch! j = fail[3] = 2	6, 2
9	text[6]='A' == pat[2]='A'	Match. i++, j++	7, 3
10	text[7]='B' == pat[3]='B'	Match. i++, j++	8, 4
11	text[8]='C' == pat[4]='C'	Match! j == m. Found at index 4.	9, 5

text:A
B
A
B
A
B
A
B
C
pat:
A
B
A
B
C
Pattern found at index 4.

💡

Notice: At step 5, i stays at 4 while j falls back from 4 to 2. The text pointer never moves backward. This is the fundamental guarantee of KMP — it processes each text character at most twice (once on match, once triggering a fallback), giving O(n+m) total comparisons.

Step 07

Common Problems

These are classic LeetCode problems that use string matching algorithms, listed in rough order of difficulty.

#28 Find the Index of the First Occurrence in a String Easy

#459 Repeated Substring Pattern Easy

#686 Repeated String Match Medium

#572 Subtree of Another Tree Medium

#214 Shortest Palindrome Hard

#1392 Longest Happy Prefix Hard

💡

Practice tip: Start with #28 (direct KMP application). Then try #459, which uses a clever property of the failure function: a string has a repeating unit if and only if n % (n - fail[n-1]) == 0. For #214 (Shortest Palindrome), concatenate pattern + "#" + reverse(pattern) and compute the failure function to find the longest palindromic prefix.

Step 08

Interactive Demo

Enter a text and pattern, then step through the KMP algorithm. Watch the failure function being built, then see pattern alignment shift on mismatches without ever moving backward in the text.

⚙ KMP String Matching Visualizer

Text

Pattern

FAILURE FUNCTION

TEXT & PATTERN ALIGNMENT

Press "Step →" or "Run All" to begin.

Step 09

Complexity Analysis

KMP Time

O(n + m)

KMP Space

O(m)

Rabin-Karp Time (avg)

O(n + m)

Rabin-Karp Space

O(1)

Why O(n + m) for KMP?

Building the failure function: O(m). The pointer i advances at most m times, and len decreases at most as many times as it increased. Total operations bounded by 2m.

Searching: O(n). The text pointer i advances at most n times and never moves backward. The pattern pointer j can fall back via the failure function, but each fallback corresponds to a previous advance, so total fallbacks are bounded by n.

Space: The failure function array uses O(m) space. Rabin-Karp uses O(1) extra space (just hash values) but has worst-case O(n*m) due to hash collisions.

⚡

The key insight: Both algorithms achieve linear time by ensuring the text pointer never backtracks. KMP does this via the failure function; Rabin-Karp does it via O(1) hash updates. The brute force backtracks the text pointer on every mismatch, which is why it degrades to O(n*m).

Step 10

Key Takeaways

01

The failure function is the heart of KMP. It precomputes, for each position in the pattern, the longest proper prefix that is also a suffix. This single array enables O(n+m) matching by telling you exactly where to continue after a mismatch.

02

The text pointer never moves backward in KMP. This is the fundamental property that guarantees O(n+m). Each text character is examined at most a constant number of times. Contrast this with brute force, which re-scans text characters.

03

Rabin-Karp trades guarantees for flexibility. Average O(n+m) with O(1) space. Excels at multi-pattern search, 2D matching, and duplicate substring detection via binary search + rolling hash.

04

The failure function reveals string structure. Beyond matching, it detects repeated patterns (n % (n - fail[n-1]) == 0), finds the longest palindromic prefix (via reverse concatenation), and computes the longest happy prefix directly.

05

Many "non-obvious" problems reduce to string matching. Subtree matching (serialize trees as strings), repeated string match (repeat + KMP), and shortest palindrome (reverse + failure function) all become straightforward once you recognize the pattern.

Roadmap