Magicsheet logo

Short Encoding of Words

Medium
25%
Updated 8/1/2025

Asked by 1 Company

Short Encoding of Words

What is this problem about?

The Short Encoding of Words interview question asks you to find the length of the shortest reference string that encodes all words. A word can be encoded if it appears as a suffix of the reference string followed by the delimiter #. For example, "time#bell#" encodes both "time" and "bell". Words that are suffixes of other words in the list don't need their own entry. Minimize the total length of the encoding string.

Why is this asked in interviews?

Apple asks this problem because it tests Trie-based suffix deduplication — a core technique in text compression and string data structures. The key insight is that words which are suffixes of other words can be omitted (they're already covered). This models suffix trie construction and efficient string encoding, both directly applicable in database indexing, compressed tries, and suffix array construction.

Algorithmic pattern used

Two approaches: set-based deduplication and Trie. Set approach: add all words to a set. For each word, remove all its proper suffixes from the set — suffixes are already covered. The answer is the sum of (len(word) + 1) for all remaining words (the +1 for the # delimiter). Trie approach: insert all reversed words into a Trie. Only Trie leaves (words not a prefix of any other reversed word) need their own encoding. Sum lengths of leaf words plus 1 each.

Example explanation

Words: ["time", "bell", "me", "me"].

Set: {"time", "bell", "me"}.

  • "time" has suffix "me", "ime", "e" — remove "me". Set: {"time", "bell"}.
  • "bell" has no suffix in set.

Encoding: "time#bell#" → length = 5+1+4+1 = 11.

Without deduplication: "time#bell#me#" = 13.

Common mistakes candidates make

  • Not removing duplicates from the input list — the same word appearing twice shouldn't contribute twice to the encoding.
  • Checking word prefixes instead of suffixes — a word is redundant if it's a SUFFIX of another (not a prefix).
  • Not adding the +1 for the # delimiter when computing the encoding length.
  • Using O(n^2) direct suffix checking instead of a Trie or set-based approach.

Interview preparation tip

For the Short Encoding of Words coding problem, the Trie hash table string interview pattern is the efficient approach. The set-based suffix removal is simpler to implement: for each word, iterate over all suffix slices word[1:], word[2:], etc. and remove them from the set. Apple interviewers appreciate the Trie approach for its structural elegance. Practice both — the set approach is O(n × L^2) and the Trie approach is O(n × L) where L is the average word length.

Similar Questions