Magicsheet logo

Delete Duplicate Folders in System

Hard
87.5%
Updated 6/1/2025

Delete Duplicate Folders in System

What is this problem about?

The Delete Duplicate Folders in System interview question is a complex file system challenge. You are given a list of paths representing a folder structure. You need to identify and delete all folders that have the exact same structure of subfolders (including their names and their own sub-structures). This Delete Duplicate Folders in System coding problem is a test of tree serialization and hashing.

Why is this asked in interviews?

This is a HARD difficulty problem asked by companies like Nvidia and Google. it tests your proficiency with Tries, Tree Traversal, and Hash Functions. It evaluates if you can serialize a complex recursive structure into a string and use a map to find identical duplicates. It’s a classic deduplication problem found in storage systems and compilers.

Algorithmic pattern used

This utilizes the Array, Hash Table, Trie, Hash Function interview pattern.

  1. Build Trie: Insert all paths into a Trie where each node represents a folder.
  2. Serialize (Post-order): Traverse the tree from the bottom up. For each node, create a string representation of its children: (child1_name, child1_serialization)(child2_name, child2_serialization)...
  3. Frequency Map: Store these serializations in a map. If a serialization (for a non-leaf folder) appears more than once, mark those folders for deletion.
  4. Reconstruct: Perform a final traversal to collect paths that weren't marked for deletion.

Example explanation

Paths: /a/x, /a/y, /b/x, /b/y

  1. Folder a contains x and y. Serialization: (x)(y).
  2. Folder b contains x and y. Serialization: (x)(y).
  3. Since (x)(y) appeared twice, both a and b (and their contents) are marked as duplicates and removed. Result: Empty (if no other unique folders exist).

Common mistakes candidates make

  • Ignoring folder names: Only comparing the shape of the subtrees without checking if the folder names match.
  • Leaf duplicates: Usually, the problem specifies that "empty" folders (leaves) aren't considered duplicates of each other unless they share a parent structure.
  • Efficiency: Not using a string builder or a proper hash, leading to very slow string concatenations in large systems.

Interview preparation tip

Master "Tree Serialization." Whether it's binary trees or file systems, the ability to convert a recursive structure into a unique string is a vital skill for comparison and caching problems.

Similar Questions