Magicsheet logo

Extra Characters in a String

Medium
79.3%
Updated 6/1/2025

Extra Characters in a String

What is this problem about?

The Extra Characters in a String coding problem gives you a string s and a dictionary of words. You need to break the string into several non-overlapping substrings such that each substring is either one of the words in the dictionary or is considered "extra characters." Your goal is to minimize the total number of extra characters left over.

Why is this asked in interviews?

Companies like Amazon and Google ask this to evaluate your Dynamic Programming (DP) skills. It’s a variation of the "Word Break" problem. It tests your ability to define a state that minimizes a cost (extra characters) rather than just finding a boolean "is possible." It also evaluates whether you can optimize dictionary lookups using a Hash Set or a Trie.

Algorithmic pattern used

The problem is solved using 1D Dynamic Programming.

  1. Define dp[i] as the minimum extra characters for the suffix s[i...].
  2. Base case: dp[n] = 0 (empty string has 0 extra chars).
  3. Transition:
    • Option 1: Treat s[i] as an extra character. dp[i] = 1 + dp[i+1].
    • Option 2: Try every word in the dictionary that matches a prefix of s[i...]. If s[i...i+len-1] is a valid word, dp[i] = min(dp[i], dp[i+len]).
  4. The result is dp[0].

Example explanation

s = "leetscode", dictionary = ["leet", "code"]

  1. dp[9] = 0
  2. At "code" (index 5): "code" is in dictionary. dp[5] = dp[9] = 0.
  3. At "s" (index 4): "s" is not in dictionary. dp[4] = 1 + dp[5] = 1.
  4. At "leet" (index 0): "leet" is in dictionary. dp[0] = min(1 + dp[1], dp[4]) = min(..., 1) = 1. The minimum extra characters is 1 (the 's').

Common mistakes candidates make

  • Greedy approach: Trying to match the longest word first, which might not lead to the overall minimum extra characters.
  • Inefficient Search: Re-scanning the entire dictionary for every index instead of using a Hash Set for O(1) prefix checks or a Trie.
  • Incorrect DP state: Defining the state in a way that doesn't allow for the "skip one character" option.

Interview preparation tip

Always consider both "Forward DP" (calculating best result for prefix 0..i) and "Backward DP" (suffix i..n). For string partitioning, Backward DP often leads to slightly cleaner base case logic.

Similar Questions