Find a pattern inside a text efficiently using KMP or Rabin-Karp, avoiding the brute-force O(n*m) trap.
String matching is the problem of finding all occurrences of a short pattern string inside a longer text string. The naive approach slides the pattern one character at a time and compares every character, giving O(n*m) time. Efficient algorithms like KMP and Rabin-Karp reduce this to O(n + m) by reusing information from previous comparisons.
Align the pattern under the text and compare character by character. When a mismatch occurs, instead of restarting from scratch, use precomputed information to skip ahead intelligently.
Why does this matter? In the brute force, after a mismatch you throw away everything you learned about the text. KMP and Rabin-Karp ensure you never re-examine a text character (KMP) or compare entire substrings in O(1) (Rabin-Karp). This is the difference between O(n*m) and O(n+m).
Look for these recognition signals in the problem statement. If you spot one, a string matching algorithm is very likely the intended approach.
"Find the first occurrence" or "indexOf"
"Repeated pattern" or "periodic string"
"Longest prefix which is also a suffix"
"Subtree matching" or "subarray matching"
The key clue: You need to check if one string (or serialized structure) appears inside another. A brute-force nested loop would be O(n*m). KMP solves it in O(n+m) by precomputing a failure function. Rabin-Karp achieves the same average-case with a rolling hash, and is easier to extend to multi-pattern or 2D matching.
KMP precomputes a failure function (also called the "partial match table" or "prefix function") for the pattern. For each position j in the pattern, fail[j] stores the length of the longest proper prefix of pattern[0..j] that is also a suffix. When a mismatch occurs at position j, we jump to fail[j-1] instead of restarting at 0.
We compare the pattern against itself to find prefix-suffix overlaps.
Why this works: When text[i] mismatches pattern[j], we know that pattern[0..j-1] matched the text. The failure function tells us the longest prefix of the pattern that is also a suffix of what we already matched — so we can jump j = fail[j-1] and continue without re-scanning any text character.
Instead of comparing characters one by one, Rabin-Karp computes a hash of the pattern and a rolling hash of each text window. If the hashes match, we verify character by character (to handle collisions). The rolling hash updates in O(1) as the window slides.
Hash of "ABD" = hp. Slide a window of length 3 across the text.
When to choose Rabin-Karp over KMP: Rabin-Karp shines when you need to search for multiple patterns simultaneously (each has its own hash), when doing 2D pattern matching, or when the problem involves finding duplicate substrings (binary search on length + rolling hash). KMP is preferred for single-pattern search with guaranteed O(n+m) worst case.
Here are the annotated Java templates for both KMP and Rabin-Karp. KMP is the go-to for interview problems that need guaranteed linear time.
// Build the failure (prefix) function for pattern int[] buildFail(String pat) { int m = pat.length(); int[] fail = new int[m]; int len = 0; // length of prev longest prefix-suffix int i = 1; while (i < m) { if (pat.charAt(i) == pat.charAt(len)) { len++; fail[i] = len; i++; } else { if (len > 0) { len = fail[len - 1]; // fall back, don't increment i } else { fail[i] = 0; i++; } } } return fail; }
// Return first index where pattern appears in text, or -1 int kmpSearch(String text, String pat) { int[] fail = buildFail(pat); int i = 0, j = 0; // i for text, j for pattern while (i < text.length()) { if (text.charAt(i) == pat.charAt(j)) { i++; j++; if (j == pat.length()) { return i - j; // match found } } else { if (j > 0) { j = fail[j - 1]; // use failure function } else { i++; // no fallback, advance text } } } return -1; // no match }
// Rabin-Karp: average O(n+m), worst O(n*m) on hash collisions int rabinKarp(String text, String pat) { int n = text.length(), m = pat.length(); long B = 31, MOD = 1_000_000_007L; // base and modulus long patHash = 0, winHash = 0, power = 1; for (int i = m - 1; i >= 0; i--) { // compute hash + highest power patHash = (patHash + pat.charAt(i) * power) % MOD; winHash = (winHash + text.charAt(i) * power) % MOD; if (i > 0) power = power * B % MOD; } for (int i = 0; i <= n - m; i++) { if (winHash == patHash && text.substring(i, i + m).equals(pat)) return i; // verify on hash match if (i < n - m) { // roll the hash forward winHash = (winHash - text.charAt(i) * power % MOD + MOD) % MOD; winHash = (winHash * B + text.charAt(i + m)) % MOD; } } return -1; }
KMP guarantees O(n+m) worst case. Rabin-Karp is O(n+m) on average but can degrade to O(n*m) with hash collisions. In interviews, KMP is the safer choice for single-pattern search. Use Rabin-Karp when you need hashing flexibility (multi-pattern, duplicate substrings).
Let us trace KMP on text = "ABABABABC" and pattern = "ABABC", showing how the failure function prevents re-scanning text characters.
Failure function: fail = [0, 0, 1, 2, 0]
| STEP | COMPARE | ACTION | i, j |
|---|---|---|---|
| 1 | text[0]='A' == pat[0]='A' | Match. i++, j++ | 1, 1 |
| 2 | text[1]='B' == pat[1]='B' | Match. i++, j++ | 2, 2 |
| 3 | text[2]='A' == pat[2]='A' | Match. i++, j++ | 3, 3 |
| 4 | text[3]='B' == pat[3]='B' | Match. i++, j++ | 4, 4 |
| 5 | text[4]='A' ≠ pat[4]='C' | Mismatch! j = fail[3] = 2 | 4, 2 |
| 6 | text[4]='A' == pat[2]='A' | Match. i++, j++ | 5, 3 |
| 7 | text[5]='B' == pat[3]='B' | Match. i++, j++ | 6, 4 |
| 8 | text[6]='A' ≠ pat[4]='C' | Mismatch! j = fail[3] = 2 | 6, 2 |
| 9 | text[6]='A' == pat[2]='A' | Match. i++, j++ | 7, 3 |
| 10 | text[7]='B' == pat[3]='B' | Match. i++, j++ | 8, 4 |
| 11 | text[8]='C' == pat[4]='C' | Match! j == m. Found at index 4. | 9, 5 |
Notice: At step 5, i stays at 4 while j falls back from 4 to 2. The text pointer never moves backward. This is the fundamental guarantee of KMP — it processes each text character at most twice (once on match, once triggering a fallback), giving O(n+m) total comparisons.
These are classic LeetCode problems that use string matching algorithms, listed in rough order of difficulty.
Practice tip: Start with #28 (direct KMP application). Then try #459, which uses a clever property of the failure function: a string has a repeating unit if and only if n % (n - fail[n-1]) == 0. For #214 (Shortest Palindrome), concatenate pattern + "#" + reverse(pattern) and compute the failure function to find the longest palindromic prefix.
Enter a text and pattern, then step through the KMP algorithm. Watch the failure function being built, then see pattern alignment shift on mismatches without ever moving backward in the text.
Building the failure function: O(m). The pointer i advances at most m times, and len decreases at most as many times as it increased. Total operations bounded by 2m.
Searching: O(n). The text pointer i advances at most n times and never moves backward. The pattern pointer j can fall back via the failure function, but each fallback corresponds to a previous advance, so total fallbacks are bounded by n.
Space: The failure function array uses O(m) space. Rabin-Karp uses O(1) extra space (just hash values) but has worst-case O(n*m) due to hash collisions.
The key insight: Both algorithms achieve linear time by ensuring the text pointer never backtracks. KMP does this via the failure function; Rabin-Karp does it via O(1) hash updates. The brute force backtracks the text pointer on every mismatch, which is why it degrades to O(n*m).
The failure function is the heart of KMP. It precomputes, for each position in the pattern, the longest proper prefix that is also a suffix. This single array enables O(n+m) matching by telling you exactly where to continue after a mismatch.
The text pointer never moves backward in KMP. This is the fundamental property that guarantees O(n+m). Each text character is examined at most a constant number of times. Contrast this with brute force, which re-scans text characters.
Rabin-Karp trades guarantees for flexibility. Average O(n+m) with O(1) space. Excels at multi-pattern search, 2D matching, and duplicate substring detection via binary search + rolling hash.
The failure function reveals string structure. Beyond matching, it detects repeated patterns (n % (n - fail[n-1]) == 0), finds the longest palindromic prefix (via reverse concatenation), and computes the longest happy prefix directly.
Many "non-obvious" problems reduce to string matching. Subtree matching (serialize trees as strings), repeated string match (repeat + KMP), and shortest palindrome (reverse + failure function) all become straightforward once you recognize the pattern.