Magicsheet logo

Longest Duplicate Substring

Hard
70.2%
Updated 6/1/2025

Longest Duplicate Substring

What is this problem about?

The Longest Duplicate Substring problem challenges you to find the longest substring that occurs at least twice in a given string S. The two occurrences can overlap. For instance, in the string "banana", the longest duplicate substring is "ana". If no duplicate substring exists, the output should be an empty string.

Why is this asked in interviews?

This is considered a Hard-level problem and is typically asked by companies like Google or Amazon for senior engineering roles. It rigorously tests a candidate's grasp of advanced string searching algorithms and their ability to combine different algorithmic concepts to conquer large time complexity hurdles. A brute-force solution here will drastically fail, pushing candidates to demonstrate high-level optimization skills.

Algorithmic pattern used

The optimal solution marries Binary Search with a Rolling Hash (Rabin-Karp Algorithm). The possible length of the duplicate substring ranges from 1 to N. Instead of checking all lengths, we can binary search the length L. For a guessed length L, we use a Rolling Hash to quickly slide a window of size L across the string, hashing each substring. If we find a hash collision (and verify the string match to handle hash collisions), we know a duplicate of length L exists, so we search for a larger length.

Example explanation

Consider the string S = "banana". Length N = 6.

  • We binary search the length between 1 and 5. Midpoint L = 3.
  • We check all substrings of length 3 using a rolling hash: "ban", "ana", "nan", "ana".
  • As we slide the window, we record hashes in a Hash Set.
  • The hash for the second "ana" matches the hash stored for the first "ana". We verify the strings match. Success! Length 3 is possible.
  • We update our binary search to look for longer lengths (between 4 and 5).
  • Midpoint L = 4. Substrings: "bana", "anan", "nana". No duplicates.
  • The longest duplicate found was "ana".

Common mistakes candidates make

The main trap is relying on simple Hash Sets without a Rolling Hash. If you extract substrings using S.substring(i, i+L) on every step, string extraction itself takes O(L)O(L) time, leading to an O(N×L)O(N \times L) check, which is too slow. Another common mistake is implementing the Rolling Hash but ignoring hash collisions. Because of integer limits, different strings can produce the same hash; you must do an exact string comparison if hashes match to prevent false positives.

Interview preparation tip

To excel at the Longest Duplicate Substring interview pattern, you must master the Rabin-Karp Rolling Hash. Practice writing the mathematics of it: how to add a new character to the right of the hash and remove the leftmost character in O(1)O(1) time using modular arithmetic. Once you combine this fast hashing technique with binary search, this daunting problem becomes a predictable template.

Similar Questions