The Game Play Analysis II coding problem builds on the first one. Given the same Activity table, you now need to report the device_id that each player used on their very first login. This is slightly trickier because you can't just use MIN(device_id)—you need the specific device associated with the MIN(event_date).
Amazon asks this Database interview pattern to test your ability to retrieve entire rows based on an aggregated subquery or using Window Functions. It separates beginners who only know basic GROUP BY from intermediate SQL users who can link an aggregated result back to the original dataset.
There are two main ways to solve this:
player_id and MIN(event_date) for each player (exactly like Game Play Analysis I).Activity table on both player_id and event_date to retrieve the corresponding device_id.FIRST_VALUE(device_id) OVER(PARTITION BY player_id ORDER BY event_date) to directly grab the first device.ROW_NUMBER() OVER(PARTITION BY player_id ORDER BY event_date) as rn and filter for rn = 1.Activity:
| player_id | device_id | event_date |
|---|---|---|
| 1 | 2 | 2016-03-01 |
| 1 | 3 | 2016-05-02 |
Using ROW_NUMBER:
rn = 1.rn = 2.WHERE rn = 1, resulting in player_id = 1, device_id = 2.SELECT player_id, MIN(event_date), device_id GROUP BY player_id. This is invalid SQL in most databases because device_id is not aggregated. In MySQL (without strict mode), it might return a random device_id, which is incorrect.IN approach, it can be slow on very large tables compared to Window Functions.Window functions like ROW_NUMBER() are the modern, clean way to solve "find the row holding the min/max value" problems. Practice the PARTITION BY and ORDER BY syntax within the OVER() clause.