Longest Duplicate Substring

Hard

70.2%

Updated 6/1/2025

Asked by 7 Companies

Goldman Sachs Meta Amazon TikTok Coupang Google Bloomberg

Topics

Suffix Array Hash Function Rolling Hash Sliding Window Binary Search String

Longest Duplicate Substring

What is this problem about?

The Longest Duplicate Substring problem challenges you to find the longest substring that occurs at least twice in a given string S. The two occurrences can overlap. For instance, in the string "banana", the longest duplicate substring is "ana". If no duplicate substring exists, the output should be an empty string.

Why is this asked in interviews?

This is considered a Hard-level problem and is typically asked by companies like Google or Amazon for senior engineering roles. It rigorously tests a candidate's grasp of advanced string searching algorithms and their ability to combine different algorithmic concepts to conquer large time complexity hurdles. A brute-force solution here will drastically fail, pushing candidates to demonstrate high-level optimization skills.

Algorithmic pattern used

The optimal solution marries Binary Search with a Rolling Hash (Rabin-Karp Algorithm). The possible length of the duplicate substring ranges from 1 to N. Instead of checking all lengths, we can binary search the length L. For a guessed length L, we use a Rolling Hash to quickly slide a window of size L across the string, hashing each substring. If we find a hash collision (and verify the string match to handle hash collisions), we know a duplicate of length L exists, so we search for a larger length.

Example explanation

Consider the string S = "banana". Length N = 6.

We binary search the length between 1 and 5. Midpoint L = 3.
We check all substrings of length 3 using a rolling hash: "ban", "ana", "nan", "ana".
As we slide the window, we record hashes in a Hash Set.
The hash for the second "ana" matches the hash stored for the first "ana". We verify the strings match. Success! Length 3 is possible.
We update our binary search to look for longer lengths (between 4 and 5).
Midpoint L = 4. Substrings: "bana", "anan", "nana". No duplicates.
The longest duplicate found was "ana".

Common mistakes candidates make

The main trap is relying on simple Hash Sets without a Rolling Hash. If you extract substrings using S.substring(i, i+L) on every step, string extraction itself takes $O(L)$ time, leading to an $O(N \times L)$ check, which is too slow. Another common mistake is implementing the Rolling Hash but ignoring hash collisions. Because of integer limits, different strings can produce the same hash; you must do an exact string comparison if hashes match to prevent false positives.

Interview preparation tip

To excel at the Longest Duplicate Substring interview pattern, you must master the Rabin-Karp Rolling Hash. Practice writing the mathematics of it: how to add a new character to the right of the hash and remove the leftmost character in $O(1)$ time using modular arithmetic. Once you combine this fast hashing technique with binary search, this daunting problem becomes a predictable template.

Title	Difficulty	Topics	LeetCode
Sum of Scores of Built Strings	Hard	Suffix Array String Matching Hash Function Rolling Hash Binary Search String	Solve
Longest Repeating Substring	Medium	Suffix Array Hash Function Rolling Hash Binary Search String Dynamic Programming	Solve
Longest Common Subpath	Hard	Array Suffix Array Hash Function Rolling Hash Binary Search	Solve
Number of Distinct Substrings in a String	Medium	Suffix Array Trie Hash Function Rolling Hash String	Solve
Equalize Strings by Adding or Removing Characters at Ends	Medium	Hash Function Sliding Window Binary Search String Dynamic Programming	Solve

Longest Duplicate Substring

Asked by 7 Companies

Topics

Longest Duplicate Substring

What is this problem about?

Why is this asked in interviews?

Algorithmic pattern used

Example explanation

Common mistakes candidates make

Interview preparation tip

Similar Questions