Magicsheet logo

Duplicate Emails

Easy
56.5%
Updated 6/1/2025

Duplicate Emails

What is this problem about?

The Duplicate Emails interview question is a fundamental database query task. You are given a table named Person with columns Id and Email. Your goal is to write a SQL query that identifies all email addresses that appear more than once in the table. This is a common data validation task used to ensure data integrity or identify user accounts with multiple entries.

Why is this asked in interviews?

This question is frequently used by companies like Microsoft and Meta to test basic SQL knowledge. It evaluates a candidate's understanding of database interview pattern concepts like grouping, aggregation, and filtering. Specifically, it checks if you know how to use the GROUP BY and HAVING clauses together, which is essential for any backend or data engineering role.

Algorithmic pattern used

This is a standard SQL Aggregation problem.

  1. Use GROUP BY Email to group rows with the same email.
  2. Use the COUNT() aggregate function to count the number of occurrences in each group.
  3. Use the HAVING clause to filter for groups where the count is strictly greater than 1. SELECT Email FROM Person GROUP BY Email HAVING COUNT(Email) > 1;

Example explanation

Suppose the Person table looks like this:

IdEmail
1a@b.com
2c@d.com
3a@b.com
  1. Grouping by Email creates two groups: a@b.com (2 rows) and c@d.com (1 row).
  2. The HAVING COUNT(Email) > 1 filter excludes c@d.com.
  3. The result is a@b.com.

Common mistakes candidates make

  • Using WHERE instead of HAVING: Trying to use WHERE COUNT(Email) > 1, which fails because WHERE filters rows before grouping, while HAVING filters groups after aggregation.
  • Incorrect Column Selection: Selecting columns that are not part of the GROUP BY clause without using an aggregate function.
  • Complexity: Writing complex subqueries or self-joins when a simple GROUP BY is sufficient.

Interview preparation tip

Always remember the order of operations in SQL: FROM -> JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY. Understanding that HAVING is specifically for aggregated results is key to passing database-focused interviews.

Similar Questions