High Five
What is this problem about?
The High Five coding problem asks you to calculate the "Top Five" average score for each student in a class. You are given a list of scores, where each item contains a student ID and their score on a particular test. You need to find the average of the five highest scores for each student and return the results as a list of [student_id, top_five_average], sorted by student ID.
Why is this asked in interviews?
Amazon and Goldman Sachs use this "Easy" question to test basic data grouping and sorting skills. It evaluates how you use Hash Table interview patterns to organize data by ID and Heap (Priority Queue) interview patterns to maintain a top-K set. It also checks your ability to handle integer division requirements (usually the floor of the average).
Algorithmic pattern used
This problem uses a Hash Map of Min-Heaps.
- Create a map where the key is the student ID and the value is a Min-Heap of size 5.
- Iterate through the scores. For each score:
- Add it to the student's Min-Heap.
- If the heap size exceeds 5, remove the smallest element. This ensures the heap always contains the 5 largest scores seen so far.
- After processing all scores, iterate through the map. For each student, sum the 5 scores in their heap and divide by 5.
- Sort the resulting list by student ID and return.
Example explanation
Input: [[1, 100], [1, 90], [1, 80], [1, 70], [1, 60], [1, 50]]
- Student 1 scores: [100, 90, 80, 70, 60, 50].
- Min-Heap keeps the 5 largest: {60, 70, 80, 90, 100}. (50 is discarded).
- Sum = 400.
- Average = 400/5=80.
Result:
[[1, 80]].
Common mistakes candidates make
- Full Sort: Sorting every score for every student when you only need the top 5. While sorting works for small data, a Heap is more efficient for large streams.
- Integer Division: Forgetting to floor the average if the sum isn't perfectly divisible by 5.
- Tie-breaking: Not sorting the final output by student ID as required.
Interview preparation tip
When asked for "Top K" elements in a stream or group, always consider a Priority Queue (Min-Heap). It allows you to keep track of the largest elements in O(logK) time per insertion, which is much better than sorting the whole list.