Delete Duplicate Emails
What is this problem about?
The Delete Duplicate Emails interview question is a classic SQL data cleaning task. You are given a Person table with Id and Email. You need to delete all duplicate email entries, keeping only the one with the smallest Id. This Delete Duplicate Emails coding problem is a test of your ability to write DELETE statements with subqueries or self-joins.
Why is this asked in interviews?
Microsoft and Amazon use this to test basic database maintenance skills. It evaluates if you know how to use a table alias to compare a table with itself and whether you can write a WHERE clause that targets specific rows for removal based on an aggregate condition.
Algorithmic pattern used
This follows the Database interview pattern.
- Self-Join approach: DELETE p1 FROM Person p1, Person p2 WHERE p1.Email = p2.Email AND p1.Id > p2.Id. This deletes the row with the larger ID if a smaller ID with the same email exists.
- Subquery approach: Identify the MIN(Id) for each email and delete any row whose ID is not in that list.
Example explanation
Table:
- The query finds that ID 1 and ID 3 have the same email.
- Since 3 > 1, ID 3 is targeted for deletion.
- ID 2 is unique and stays.
Final Table: IDs 1 and 2.
Common mistakes candidates make
- SELECT instead of DELETE: Forgetting that the problem asks to modify the table, not just return a view.
- Keeping the wrong ID: Keeping the largest ID instead of the smallest.
- Subquery Limitations: In MySQL, you cannot delete from a table if you are selecting from it in a subquery without using an intermediate temporary table or a join.
Interview preparation tip
Always clarify which database dialect is being used (MySQL, PostgreSQL, SQL Server). The syntax for DELETE with joins varies slightly, and knowing these nuances shows professional experience.