Magicsheet logo

Shortest Matching Substring

Hard
25%
Updated 8/1/2025

Shortest Matching Substring

What is this problem about?

The Shortest Matching Substring interview question gives you a text string s and a pattern p containing at most two wildcard * characters (each matching any sequence of characters, including empty). Find the length of the shortest substring of s that matches the pattern p. If no match exists, return -1. This tests multi-wildcard string matching with length minimization.

Why is this asked in interviews?

Amazon asks this HARD string matching problem because wildcard pattern matching is fundamental in search engines, file system glob operations, and text editors. With two wildcards, the pattern splits into three literal parts. Finding the shortest matching substring requires efficiently locating the leftmost occurrence of each literal part in sequence, which can be done with KMP or binary search on prefix occurrences.

Algorithmic pattern used

The pattern is split-by-wildcard then sequential prefix matching. Split p by * into at most three literal parts: prefix, middle, suffix. Find all starting positions of the suffix in s. For each suffix occurrence, search backwards for the latest middle occurrence before it, then for the latest prefix occurrence before that. The length of the current match is end_of_suffix - start_of_prefix. Track the minimum such length. Use binary search on precomputed occurrence lists for efficiency.

Example explanation

s = "aabaabaab", p = "a*b*b".

Split by *: parts = ["a", "b", "b"].

  • Find "a" at positions: 0,1,3,4,6,7.
  • Find "b" at positions: 2,5,8.
  • Find "b" (suffix) at positions: 2,5,8.

For suffix at index 8: look for middle "b" before 8 → index 5. Look for prefix "a" before 5 → index 4. Length = 8-4+1 = 5. For suffix at index 5: middle "b" at 2. Prefix "a" at 1. Length = 5-1+1 = 5. For suffix at index 2: middle at... no "b" before 2 at ≥ prefix end → invalid.

Minimum length: 5 (e.g., "abaab").

Common mistakes candidates make

  • Not handling the case where the pattern has fewer than two wildcards — adjust the three-part split accordingly.
  • Greedily picking the first occurrence of each part (leftmost) instead of the latest before the next part — for shortest match, pick the RIGHTMOST valid occurrence.
  • Not considering empty wildcard matches — * can match zero characters.
  • Not using binary search on occurrence lists — linear scan for each position pair is O(n^3) which TLEs.

Interview preparation tip

For the Shortest Matching Substring coding problem, the string matching two-pointer binary search interview pattern is the efficient approach. The key insight: to minimize match length, choose the rightmost valid occurrence of each earlier part (maximizing compression). Amazon interviewers test the split-into-parts reduction — state it explicitly before coding. Practice KMP for efficient single-pattern matching and binary search on sorted occurrence lists for multi-pattern sequential matching. These two tools combined solve most complex substring matching problems.

Similar Questions