Magicsheet logo

Find Duplicate Subtrees

Medium
45.1%
Updated 6/1/2025

Find Duplicate Subtrees

What is this problem about?

The Find Duplicate Subtrees interview question asks you to identify all duplicate subtrees within a given binary tree. Two subtrees are considered duplicates if they have exactly the same structure and the same node values. For each set of duplicate subtrees, you only need to return the root node of one of them.

Why is this asked in interviews?

This Find Duplicate Subtrees coding problem is a favorite for companies like Salesforce and Amazon because it combines tree traversal with data serialization and frequency tracking. It tests if you can represent a complex object (a subtree) as a unique identifier (like a string or a hash) that can be stored in a hash map. It also evaluates your understanding of recursive Depth-First Search (DFS) patterns.

Algorithmic pattern used

This problem utilizes the Hash Table, Depth-First Search, Binary Tree, Tree interview pattern. The key is to perform a post-order traversal where each node returns a string representation of its entire subtree. By collecting these serialized strings in a hash map, you can keep track of how many times each specific structure has appeared.

Example explanation

Consider a tree where the root is 1. Its left child is 2, and its right child is 3. Node 2 has a left child 4, and node 3 also has a left child 4.

  1. During DFS, we reach leaf node 4. We serialize it as "4,#,#,".
  2. The map records "4,#,#," with a count of 1.
  3. We visit node 2, which serializes as "2,(4,#,#,),#,".
  4. We visit the other leaf node 4. Its serialization is "4,#,#,".
  5. The map count for "4,#,#," becomes 2. This is a duplicate! We add root 4 to our result list.
  6. The result will contain one instance of node 4.

Common mistakes candidates make

  • Incorrect Serialization: Failing to include markers for null nodes (like '#'), which can lead to different tree structures producing the same serialized string.
  • Adding Duplicates Multiple Times: Adding a subtree to the result list every time its count is greater than 1, rather than exactly equal to 2.
  • Time Complexity Issues: Building very long strings at each node can lead to O(N^2) complexity. An optimized approach involves using integer IDs for subtrees instead of full strings.

Interview preparation tip

Practice "Tree Serialization." The ability to convert a tree into a unique string or a sequence of values is a powerful technique for comparison and storage problems. Always remember that a null marker is essential to preserve the structure of the tree in string form.

Similar Questions