Magicsheet logo

Longest Repeating Substring

Medium
75.9%
Updated 6/1/2025

Longest Repeating Substring

What is this problem about?

The Longest Repeating Substring coding problem asks you to find the length of the longest substring that occurs at least twice in a given string. The two occurrences can overlap. For example, in "abcd", no substring repeats, so the answer is 0. In "abbaba", "ab" and "ba" repeat, so the answer is 2.

Why is this asked in interviews?

This problem evaluates a candidate's knowledge of advanced string search and optimization techniques. While an O(N2)O(N^2) Dynamic Programming solution is acceptable for mid-level roles, senior candidates are expected to know how to optimize this to O(NlogN)O(N \log N) or better. It tests your ability to implement a Rolling Hash or understand Suffix Arrays, which are critical for text editors and data compression.

Algorithmic pattern used

There are two main optimal approaches:

  1. Dynamic Programming: Like Longest Common Substring, but comparing the string against itself. dp[i][j] tracks matches between s[...i] and s[...j].
  2. Binary Search with Rolling Hash (Rabin-Karp): This is highly preferred. You binary search the possible length L (from 1 to N). For a given length, you compute the rolling hash of all substrings of length L. If a hash collision occurs (and the strings match), a duplicate exists, so you search for a longer length.

Example explanation

Let's use Binary Search on string "abbaba". Length = 6.

  • Binary search range: 1 to 5. Mid L = 3.
  • Hash all substrings of length 3: "abb", "bba", "bab", "aba".
  • No duplicates found. So L=3 is too long. Search range: 1 to 2.
  • Mid L = 1. Substrings: 'a','b','b','a','b','a'.
  • 'b' and 'a' both repeat. L=1 is valid. Search range: 2 to 2.
  • Mid L = 2. Substrings: "ab", "bb", "ba", "ab", "ba".
  • "ab" occurs at index 0 and 3. "ba" occurs at index 2 and 4. L=2 is valid! The longest repeating substring length is 2.

Common mistakes candidates make

Candidates who choose the DP approach often run out of memory on large strings because O(N2)O(N^2) space is significant. Candidates who choose the Rolling Hash approach often forget to handle hash collisions properly. If two substrings produce the same hash, you MUST perform an O(L)O(L) strict string comparison to ensure they actually match, otherwise you will return false positives.

Interview preparation tip

When tackling the Longest Repeating Substring interview pattern, implementing the Rabin-Karp Rolling Hash is the biggest flex. Practice writing a rolling hash function that accurately slides the window in O(1)O(1) time by subtracting the leading character's mathematical weight and adding the new trailing character. This pattern solves many "duplicate substring" problems instantly.

Similar Questions