The Repeated DNA Sequences interview question gives you a DNA string composed only of 'A', 'C', 'G', and 'T'. You must find all 10-character-long substrings (subsequences of length 10) that appear more than once in the string. Return all such repeated substrings. This is a sliding window problem that can be optimized with rolling hashes or bit manipulation for efficiency.
This problem is asked at Apple, Tesla, Microsoft, Meta, Amazon, LinkedIn, and Google because it tests sliding window pattern recognition, hash table usage, and optionally, rolling hash or bit manipulation for optimization. It also has real-world relevance in bioinformatics — DNA sequence analysis and genome alignment use exactly these substring-matching techniques.
The primary pattern is sliding window with a hash set. Use a set seen and a set repeated. Slide a window of size 10 across the string. For each window, check if the 10-character substring is in seen. If yes, add it to repeated. Otherwise, add it to seen. Return all elements in repeated. For optimization, use a rolling hash or bit packing (2 bits per character × 10 = 20 bits) to represent each window as an integer, enabling O(1) hashing per window.
String: "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Sliding windows of length 10:
"AAAAACCCCC" appears at position 0 and position 10 → repeated."AAAACCCCCA" → appears once."CCCCCAAAAA" → appears once.Result: ["AAAAACCCCC", "CCCCCAAAAA"] (actual results depend on the string).
With bit packing: encode A=0, C=1, G=2, T=3. Each 10-char window = 20-bit integer. Rolling update: shift left by 2, add new character's 2 bits, mask to 20 bits.
seen vs repeated), causing the same repeated substring to appear multiple times in the result.For the Repeated DNA Sequences coding problem, the sliding window and rolling hash interview pattern is the optimal approach. Start with the simple set-based O(n*10) solution, then optimize with bit packing for O(n) time. Interviewers at Grammarly and LinkedIn appreciate when you proactively offer the bit manipulation optimization as a follow-up, even if the simple version is accepted first. Practice the rolling hash update formula: hash = ((hash << 2) | new_char_bits) & mask.
| Title | Difficulty | Topics | LeetCode |
|---|---|---|---|
| Check If a String Contains All Binary Codes of Size K | Medium | Solve | |
| Binary String With Substrings Representing 1 To N | Medium | Solve | |
| Strings Differ by One Character | Medium | Solve | |
| Unique Substrings With Equal Digit Frequency | Medium | Solve | |
| Longest Nice Substring | Easy | Solve |