Magicsheet logo

Encode String with Shortest Length

Hard
25%
Updated 8/1/2025

Asked by 1 Company

Encode String with Shortest Length

What is this problem about?

The Encode String with Shortest Length coding problem asks you to compress a given string into its shortest possible representation. You can encode repeating patterns in the format k[substring], where k is the number of repetitions. For example, "aaaaa" can be encoded as 5[a]. This encoding can be nested, like 2[3[a]2[b]]. The goal is to return any string that has the absolute minimum length after all possible encodings.

Why is this asked in interviews?

This is a "Hard" difficulty problem frequently asked by Google. It tests a candidate's proficiency with dynamic programming interview pattern and their ability to solve optimal substructure problems. It requires exploring all possible ways to split a string and all possible ways to compress it, making it an excellent test of recursive thinking and optimization.

Algorithmic pattern used

The problem is solved using 2D Dynamic Programming with String Compression.

  1. Define dp[i][j] as the shortest encoded string for the substring s[i...j].
  2. Base case: If the substring length is less than 5, no compression can make it shorter (since k[s] is at least 4 chars long).
  3. Transitions:
    • Split: Try all possible split points k between i and j, and set dp[i][j] = dp[i][k] + dp[k+1][j].
    • Compress: Check if the substring s[i...j] itself has a repeating pattern. If s[i...j] can be formed by repeating a smaller string t for count times, check if count[dp_for_t] is shorter than the current dp[i][j].
  4. Use a helper function to find the shortest repeating unit of a string.

Example explanation

String: "ababababab" (length 10).

  1. Without compression, length is 10.
  2. Try splitting: "ababab" + "abab". shortest for those might be 3[ab] and 2[ab]. Total: 3[ab]2[ab] (length 10).
  3. Try pattern detection: The string is "ab" repeated 5 times.
  4. Encoded: 5[ab] (length 5).
  5. Since 5 < 10, dp for the whole string becomes 5[ab].

Common mistakes candidates make

  • Ignoring Nested Encoding: Not realizing that the repeating unit itself should be the encoded shortest version of that unit.
  • Missing Split Points: Only looking for a single repeating pattern and forgetting that a string like "abcabcabcde" might be encoded as 3[abc]de.
  • Efficiency: Not using memoization, which leads to exponential time complexity due to overlapping subproblems.

Interview preparation tip

Practice the KMP algorithm's "prefix function" or string concatenation tricks ((s + s).find(s, 1)) to find the smallest repeating unit of a string in linear time. This is a vital sub-problem for string compression tasks.

Similar Questions