Magicsheet logo

Distinct Subsequences II

Hard
12.5%
Updated 8/1/2025

Asked by 2 Companies

Distinct Subsequences II

What is this problem about?

The Distinct Subsequences II interview question asks you to find the total number of unique non-empty subsequences of a single string s. Because the string can contain duplicate characters, different sets of indices might result in the same subsequence (e.g., in "aba", deleting the first 'a' or the last 'a' results in "ba"). You need to return the count modulo 109+710^9 + 7.

Why is this asked in interviews?

Amazon and Google ask this "Hard" problem to evaluate your ability to handle duplicate counting in Dynamic Programming. It evaluation whether you can derive a formula that subtracts over-counted states. It’s a sophisticated test of combinatorial logic and modular arithmetic.

Algorithmic pattern used

This is a 1D Dynamic Programming problem with a twist.

  1. Let dp[i] be the number of distinct subsequences using the first ii characters.
  2. If character s[i] is new: dp[i] = 2 * dp[i-1] + 1 (Each existing subsequence can either include or exclude the new char, plus the char itself).
  3. If character s[i] has appeared before at index j: dp[i] = 2 * dp[i-1] - dp[j-1]. We subtract dp[j-1] because all subsequences formed by adding the previous occurrence of this character are now being duplicated by the current occurrence.
  4. Total = dp[n].

Example explanation

String: "aba"

  1. 'a': dp[1] = {a} (Count 1).
  2. 'b': dp[2] = {a, b, ab} (Count 2×1+1=32 \times 1 + 1 = 3).
  3. 'a' (Repeat):
    • New potential: {aa, ba, aba}.
    • Existing: {a, b, ab}.
    • Formula: 2×3(countbeforefirsta...whichwas0)+12 \times 3 - (count before first 'a' ... which was 0) + 1? Actually, the formula is usually dp[i]=(2×dp[i1])(modmod)dp[i] = (2 \times dp[i-1]) \pmod{mod}. If char repeated, subtract last_occurrence_dp. Final count for "aba" is 6: "a", "b", "aa", "ab", "ba", "aba".

Common mistakes candidates make

  • Negative Results: Forgetting that subtraction in modular arithmetic requires adding the modulo: (a - b + mod) % mod.
  • Overcounting: Not keeping track of the last seen position of each character.
  • Base Case: Forgetting to handle the empty subsequence or the +1 for single-character subsequences correctly.

Interview preparation tip

This problem is a variation of the "Counting Subsets" problem but with duplicates. Whenever you need to count "unique" things in a sequence with duplicates, think about storing the "last time this contribution was added" to avoid double-counting. This is a common Dynamic Programming interview pattern.

Similar Questions