Magicsheet logo

Better Compression of String

Medium
72.1%
Updated 6/1/2025

Better Compression of String

What is this problem about?

The "Better Compression of String interview question" is a data transformation task. You are given a compressed string like "a3b2a1". In this format, each character is followed by its count. However, the same character can appear multiple times (e.g., 'a' appears twice). Your goal is to "better" compress it by summing the counts of identical characters and returning the result in alphabetical order (e.g., "a4b2").

Why is this asked in interviews?

Goldman Sachs and Riot Games use the "Better Compression of String coding problem" to test a candidate's ability to parse strings and use frequency maps. It requires correctly handling multi-digit numbers (e.g., a10) and ensuring the final output is sorted. It evaluates "Hash Table interview pattern" and "String interview pattern" skills.

Algorithmic pattern used

This problem follows the Parse and Aggregate pattern.

  1. Parsing: Iterate through the string using a pointer.
  2. Character and Count: Identify the character. Then, read all subsequent digits to form the complete count (e.g., if you see 'a' followed by '1' and '0', the count is 10).
  3. Aggregation: Store the counts in a hash map or an array of size 26 (for 'a' to 'z'). Add the current count to the existing count for that character.
  4. Final Assembly: Iterate from 'a' to 'z'. If a character's count is >0> 0, append the character and its total count to a result string.

Example explanation

Input: "c2b5c10"

  1. Read 'c', count is 2. Map: {c: 2}.
  2. Read 'b', count is 5. Map: {c: 2, b: 5}.
  3. Read 'c', count is 10. Map: {c: 12, b: 5}.
  4. Sorted assembly:
    • 'b' has 5. Result: "b5".
    • 'c' has 12. Result: "b5c12".

Common mistakes candidates make

  • Single-digit assumption: Only reading one character after the letter, which fails for counts 10\geq 10.
  • Not sorting: Returning "c12b5" because that was the order in the original string. The problem usually specifies alphabetical order.
  • Inefficient concatenation: Using string + in a loop instead of a StringBuilder or list join.

Interview preparation tip

When parsing strings with mixed characters and numbers, always use a while loop inside your main loop to capture all consecutive digits. This ensures you handle multi-digit numbers correctly. This is a vital "String interview pattern" for parsing tasks.

Similar Questions