The Game Play Analysis V interview question is the final and hardest iteration of the player retention SQL series. You are asked to report for each "install date" (the date when a player first logged in), the number of players that installed on that date, and the "Day 1 retention rate." The retention rate is the number of players who logged in the day after their install date, divided by the number of players who installed on that date. The result should be rounded to 2 decimal places.
Companies like GSN Games use this Database coding problem to evaluate advanced SQL skills. It tests a candidate's ability to combine cohort definition (finding the first login date per user) with self-joins and conditional aggregation. It evaluates whether you can calculate ratios within specific groupings (cohorts) rather than just a single global ratio like in Analysis IV.
This problem relies on a CTE (Common Table Expression), Self-Join, and Conditional Aggregation.
InstallDates) that finds player_id and their MIN(event_date) as install_dt.LEFT JOIN the Activity table to this CTE. The join condition should be the player_id matching AND the Activity.event_date being exactly 1 day after the install_dt.GROUP BY install_dt.COUNT(DISTINCT player_id) from the CTE.COUNT(Activity.player_id) (since the LEFT JOIN will have NULLs if they didn't log in on day 2).Activity:
Grouping by Install Date:
INNER JOIN to find retained players, you will completely exclude install dates where zero players were retained, which is incorrect.player_ids from the initial cohort definition.Cohort analysis is a staple of data science interviews. Practice structuring your queries cleanly using CTEs. Step 1: Define the cohort (who and when). Step 2: Join the activity you want to measure. Step 3: Aggregate by the cohort dimension.