Magicsheet logo

H-Index

Medium
48.5%
Updated 6/1/2025

H-Index

What is this problem about?

The H-Index coding problem asks you to calculate a researcher's h-index based on an array of citation counts for their papers. The h-index is defined as the maximum value h such that the researcher has published at least h papers that have each been cited at least h times.

Why is this asked in interviews?

This problem is widely used by companies like Bloomberg, Amazon, and Meta to evaluate a candidate's understanding of Sorting and Counting Sort optimization. It tests your ability to translate a real-world definition (which can sound confusing) into a concrete mathematical algorithm.

Algorithmic pattern used

There are two main ways to solve this:

  1. Sorting (O(NlogN)O(N \log N)): Sort the array in descending order. Iterate through the sorted array. The h-index is the number of elements you can count before the citation value becomes smaller than your current count.
  2. Counting Sort / Bucket Sort (O(N)O(N)): Since the h-index cannot exceed the total number of papers NN, you can create an array of "buckets" from 00 to NN. Count the papers with exactly cc citations. Any paper with >N> N citations goes into bucket NN. Iterate backwards from NN, keeping a running total of papers. When the running total \ge the bucket index, that index is the h-index.

Example explanation

Citations: [3, 0, 6, 1, 5] Sorting Method:

  1. Sort descending: [6, 5, 3, 1, 0].
  2. i=0, val=6. 616 \ge 1 (1 paper).
  3. i=1, val=5. 525 \ge 2 (2 papers).
  4. i=2, val=3. 333 \ge 3 (3 papers).
  5. i=3, val=1. 141 \ge 4 (False). Result: 3.

Bucket Sort Method: N = 5. Buckets: [0, 0, 0, 0, 0, 0].

  1. Counts: 0->b[0]=1, 1->b[1]=1, 3->b[3]=1, 5->b[5]=1, 6->b[5]=2.
  2. Buckets: [1, 1, 0, 1, 0, 2].
  3. Backwards loop:
    • i=5: total = 2. 252 \ge 5 (False).
    • i=4: total = 2. 242 \ge 4 (False).
    • i=3: total = 3. 333 \ge 3 (True!). Result: 3.

Common mistakes candidates make

  • Misunderstanding the Definition: Returning the maximum citations instead of the maximum h count, or confusing the index with the value.
  • Sorting Ascending: Sorting ascending makes the logic slightly less intuitive (you have to check citations[i] >= N - i), leading to off-by-one errors.
  • Missing the O(N)O(N) optimization: Failing to recognize that the maximum possible h-index is NN, which is the trigger for the bucket sort optimization.

Interview preparation tip

Always clarify definitions. The h-index sounds tricky: "h papers with at least h citations." Break it down manually with an example before writing any code. Implementing the O(N)O(N) bucket sort approach is a strong signal of algorithm optimization skills.

Similar Questions