Magicsheet logo

Sentence Similarity II

Medium
12.5%
Updated 8/1/2025

Sentence Similarity II

What is this problem about?

The Sentence Similarity II interview question extends Sentence Similarity I by making similarity transitive: if word A is similar to B and B is similar to C, then A is similar to C. Two sentences are similar if they have the same length and each corresponding word pair belongs to the same connected component of the similarity graph. This is a Union-Find or graph connectivity problem.

Why is this asked in interviews?

Amazon, Google, and Rippling ask this problem because it demonstrates when Union-Find (Disjoint Set Union) is the right tool: when you need to check component membership after building a graph through transitive closures. It tests whether candidates can translate a "group by equivalence" problem into a Union-Find structure and then perform membership queries. Transitivity models real-world synonym relationships and semantic equivalence.

Algorithmic pattern used

The pattern is Union-Find (DSU) with string keys. For each similar pair (w1, w2), call union(w1, w2). Then validate the two sentences: check lengths are equal, and for each index i, verify find(words1[i]) == find(words2[i]) (same component). Use a dictionary-based Union-Find where keys are strings instead of integers. Apply path compression and union by rank for efficiency.

Example explanation

sentence1: ["I", "am", "fine"] sentence2: ["I", "am", "great"] pairs: [["fine", "great"], ["great", "excellent"]]

Union operations: union("fine", "great"), union("great", "excellent"). Components: {"fine", "great", "excellent"} in one component; "I", "am" each alone.

Check:

  • "I" vs "I": same word ✓.
  • "am" vs "am": same word ✓.
  • "fine" vs "great": find("fine") == find("great") ✓ (same component).

Return true.

Common mistakes candidates make

  • Using a simple hash map for similarity instead of Union-Find — hash maps don't automatically handle transitivity across multiple hops.
  • Not handling the case where a word doesn't appear in the pairs list — it should be its own component (only equal to itself).
  • Forgetting that sentences must have equal lengths — a length mismatch immediately returns false.
  • Not applying path compression in the find operation — without it, deep chains cause slow lookups.

Interview preparation tip

For the Sentence Similarity II coding problem, the Union-Find graph interview pattern is the correct tool for transitive similarity. Know the dictionary-based DSU implementation — it handles string keys elegantly. Amazon and Google interviewers value the comparison between versions I and II: "Version I uses a set, Version II uses Union-Find because of transitivity." Practice implementing string-keyed DSU with path compression — it's reusable for many graph equivalence problems including accounts merge and friend groups.

Similar Questions