Magicsheet logo

Bold Words in String

Medium
25%
Updated 8/1/2025

Bold Words in String

What is this problem about?

The Bold Words in String interview question asks you to wrap specific substrings within <b> and </b> tags. You are given a list of "words" to search for and a target "string". If two bold substrings overlap or are adjacent, they should be merged into a single set of tags. This is a classic string manipulation and pattern-matching problem.

Why is this asked in interviews?

Google frequently asks this to test a candidate's string handling and ability to manage intervals. It requires multiple steps: finding all occurrences of words, marking which characters should be bold, and then formatting the output. It tests whether you can break a problem into logical sub-tasks and handle overlapping ranges correctly.

Algorithmic pattern used

This problem can be solved using String Matching (like Aho-Corasick or simple indexOf) and a boolean mask. You create a boolean array is_bold of the same length as the string. For each word in your list, find all its occurrences in the string and mark the corresponding indices in is_bold as true. Finally, iterate through the string and the boolean array to insert <b> tags whenever is_bold switches from false to true, and </b> when it switches from true to false.

Example explanation

Words: ["ab", "bc"], String: "aabcd"

  1. "ab" occurs at index 1. is_bold[1...2] = true.
  2. "bc" occurs at index 2. is_bold[2...3] = true.
  3. Merged bold range: indices 1, 2, and 3.
  4. String becomes: a + <b> + abc + </b> + d -> a<b>abc</b>d.

Common mistakes candidates make

One common mistake is trying to insert the tags immediately while finding words, which makes handling overlaps extremely difficult. Another is failing to handle the case where words are adjacent but not overlapping (e.g., "abc" and "def" in "abcdef"). Using a boolean mask or an interval-merging approach is much cleaner and less error-prone.

Interview preparation tip

For any "marking" or "tagging" problem on a string or array, think about using a secondary boolean array or a set of intervals. Separating the "identification" phase from the "output" phase is a great way to simplify your logic.

Similar Questions