Magicsheet logo

UTF-8 Validation

Medium
25%
Updated 8/1/2025

Asked by 3 Companies

UTF-8 Validation

What is this problem about?

The UTF-8 Validation interview question asks you to determine if a given array of integers represents a valid UTF-8 encoding. Each integer represents 1 byte (8 bits). UTF-8 has specific rules: a 1-byte character starts with 0, a 2-byte character starts with 110, a 3-byte character starts with 1110, and a 4-byte character starts with 11110. Any subsequent bytes in a multi-byte character must start with 10.

Why is this asked in interviews?

Companies like Google and Meta use the UTF-8 Validation coding problem to test a candidate's proficiency with Bit Manipulation. It evaluates whether you can work with low-level data representations and correctly implement a state machine or a counter to track multi-byte sequences. It’s a great test of technical accuracy.

Algorithmic pattern used

The primary Array, Bit Manipulation interview pattern involves iterating through the bytes and using a counter to track how many "continuation bytes" (starting with 10) are expected.

  1. If counter is 0, examine the current byte to determine the character length (1, 2, 3, or 4 bytes). Update the counter accordingly.
  2. If counter > 0, verify the current byte starts with 10. If not, return false. Decrement the counter.
  3. After the loop, if the counter is 0, the string is valid.

Example explanation

Input: [197, 130]

  1. 197 in binary is 11000101. It starts with 110, so it's a 2-byte character. Counter = 1.
  2. 130 in binary is 10000010. It starts with 10. This is a valid continuation byte. Counter = 0.
  3. Result: true. Input: [235, 140]
  4. 235 is 11101011 (3-byte). Counter = 2.
  5. 140 is 10001100 (valid continuation). Counter = 1.
  6. Loop ends, but Counter is 1. Result: false.

Common mistakes candidates make

A common error is not correctly masking the bits to check the prefix (e.g., using byte & 0b11100000 == 0b11000000 for 2-byte characters). Another mistake is forgetting to handle characters longer than 4 bytes, which are not allowed in standard UTF-8. Finally, forgetting to check if the counter is zero at the very end is a frequent oversight.

Interview preparation tip

Get comfortable with Bitwise Operators (&, |, <<, >>). Practice converting integers to their binary representations mentally. These skills are essential for systems-level programming and for passing specialized technical rounds at top companies.

Similar Questions