Magicsheet logo

Design an Array Statistics Tracker

Hard
25%
Updated 8/1/2025

Design an Array Statistics Tracker

1. What is this problem about?

The Design an Array Statistics Tracker interview question involves building a system that processes a stream of numbers and provides real-time statistics. You need to support adding elements and querying the mean, median, and mode. This Design an Array Statistics Tracker coding problem is particularly challenging because it requires maintaining three different properties that respond differently to new data.

2. Why is this asked in interviews?

Amazon asks this to test your ability to balance the computational cost of different operations. It evaluates your knowledge of Heap (Priority Queue) interview patterns for medians and Hash Table interview patterns for modes. It's a test of complex state synchronization.

3. Algorithmic pattern used

  • Mean: Keep a runningSum and a count. Mean is sum / count (O(1)O(1)).
  • Median: Use the Two-Heap Pattern. A max-heap for the lower half and a min-heap for the upper half. Keep them balanced so the median is always at the tops (O(logN)O(\log N) update, O(1)O(1) query).
  • Mode: Use a Map<Integer, Integer> for frequencies and a secondary structure (like an Ordered Map or another Map of Frequency -> Set of Numbers) to track which number has the highest count (O(1)O(1) or O(logN)O(\log N) update).

4. Example explanation

Stream: [1, 2, 2, 3].

  1. sum=8, count=4. Mean = 2.
  2. Heaps: [1, 2] and [2, 3]. Median = (2+2)/2=2(2+2)/2 = 2.
  3. Frequencies: {1:1, 2:2, 3:1}. Mode = 2.
  4. Add 3: sum=11, count=5. Mean = 2.2. Mode could be 2 or 3 (usually return the smallest or first).

5. Common mistakes candidates make

  • Recalculating everything: Sorting the entire list for every median query (O(NlogN)O(N \log N)), which is too slow for a data stream.
  • Mode updates: Failing to efficiently update the current mode when a number's frequency increases.
  • Precision: Not using double/float for the mean calculation.

6. Interview preparation tip

This problem is a "composite" challenge. Solve the mean, median, and mode parts independently in your mind first. The Two-Heap pattern for medians is a must-know for any "Stream" or "Statistics" interview question.

Similar Questions