Magicsheet logo

Compare Strings by Frequency of the Smallest Character

Medium
25%
Updated 8/1/2025

Compare Strings by Frequency of the Smallest Character

What is this problem about?

The Compare Strings by Frequency of the Smallest Character interview question gives you two arrays of strings: queries and words. For each string, you define a function f(s) that counts the frequency of the lexicographically smallest character in s. For each query string q, you need to find how many strings w in the words array satisfy f(q) < f(w).

Why is this asked in interviews?

Google uses this String interview pattern to test a candidate's ability to combine multiple simple tasks (character counting) with efficient searching (binary search). It tests whether you can optimize an O(Q * W) brute-force solution into a much faster O((Q + W) log W) or even O(Q + W) solution by using frequency counts.

Algorithmic pattern used

This problem combines Frequency Counting and Binary Search.

  1. Precompute f(w) for every word in the words array.
  2. Sort these frequencies in ascending order.
  3. For each query, calculate f(q).
  4. Use Binary Search (upper_bound or bisect_right) on the sorted frequencies to find how many values are strictly greater than f(q).

Example explanation

Queries: ["cbd"], Words: ["zaaaz", "abcd"]

  1. f("cbd"): Smallest char is 'b'. Frequency of 'b' is 1.
  2. f("zaaaz"): Smallest char is 'a'. Frequency of 'a' is 3.
  3. f("abcd"): Smallest char is 'a'. Frequency of 'a' is 1.
  4. Frequencies of words: [3, 1]. Sorted: [1, 3].
  5. Query frequency is 1. How many values in [1, 3] are > 1? Only one value (3). Result: [1].

Common mistakes candidates make

  • Redundant Calculations: Re-calculating the frequency for every query/word pair.
  • Wrong Smallest Character: Finding the frequency of a random character or the first character instead of the lexicographically smallest one.
  • Inefficient Search: Using a linear search to count how many word frequencies are greater, which leads to TLE for large inputs.

Interview preparation tip

Since the maximum frequency of a string is often small (e.g., the max length of a string is 10), you can use a "Frequency of Frequencies" array (size 11 or 12) and a suffix sum to answer queries in O(1) after O(W) preprocessing.

Similar Questions