Find Longest Special Substring That Occurs Thrice II
What is this problem about?
This is the "Medium" and optimized version of the previous problem. In Find Longest Special Substring That Occurs Thrice II, the string length can be up to 5imes105. You still need to find the length of the longest special (single-character) substring that occurs at least three times.
Why is this asked in interviews?
Rubrik and Google use this version to see if you can optimize a frequency counting approach. An O(n2) or even O(nlogn) approach that stores every possible substring in a map will fail due to memory and time limits. It evaluations your ability to use Sliding Window or Precomputation patterns to only track the top few lengths for each character.
Algorithmic pattern used
The problem is solved using Character Grouping and Top-3 tracking.
- Iterate through the string and find the lengths of all contiguous blocks of the same character.
- For each character 'a'-'z', maintain a list of the lengths of its contiguous blocks.
- For each character's list, you only need to consider the top 3 longest blocks.
- A length X occurs 3 times if:
- The longest block has length ≥X+2.
- The two longest blocks have lengths ≥X+1 and ≥X (specifically, the first can provide two and the second one).
- The three longest blocks all have length ≥X.
- By sorting the lists of lengths for each character and checking these cases, you find the global maximum in O(n).
Example explanation
String: "aaaaabaaaa"
- 'a' block lengths: [5, 4].
- Sorted lengths: [5, 4].
- Candidate from length 5: 5−2=3 (e.g., "aaa").
- Candidate from combining: min(5−1,4)=4 (e.g., "aaaa" occurs twice in the first block and once in the second).
Max length: 4.
Common mistakes candidates make
- Memory limit: Trying to store all possible substrings.
- Sorting too much: Sorting all O(n) groups instead of just keeping the top 3 for each character.
- Off-by-one: Mistakes in the L−2 or L−1 logic for frequency calculation.
Interview preparation tip
When you need to find the "Top K" of something across many categories, don't store everything. Use a fixed-size Min-Heap or a small sorted list (size 3 in this case) for each category to keep only the relevant data.