Magicsheet logo

Fill Missing Data

Easy
100%
Updated 6/1/2025

Asked by 1 Company

Topics

Fill Missing Data

What is this problem about?

The Fill Missing Data interview question is a practical data engineering task. You are given a dataset (usually a Pandas DataFrame) that contains missing values (NaN or Null) in certain columns. Your task is to fill these gaps using a specific strategy, such as replacing them with a constant value, the mean/median of the column, or using "forward-fill" (taking the last valid value).

Why is this asked in interviews?

Companies like Acko ask this to test your proficiency with Data Preprocessing tools like Pandas. In real-world software engineering and data science, data is rarely clean. Knowing how to handle missing information without introducing bias or breaking the schema is a fundamental skill. It evaluates your knowledge of library APIs and your understanding of data integrity.

Algorithmic pattern used

This follows a Library API usage pattern (specifically Pandas in Python).

  1. Constant Fill: Use df.fillna(value).
  2. Mean/Median Fill: df['col'].fillna(df['col'].mean()).
  3. Forward/Backward Fill: df.fillna(method='ffill') or df.fillna(method='bfill'). The choice of pattern depends on the context of the data (e.g., time-series data usually uses forward-fill).

Example explanation

Suppose you have a table of temperatures:

DayTemp
125
2NaN
327
  1. If you use Mean fill: The mean is (25+27)/2 = 26. Day 2 becomes 26.
  2. If you use Forward fill: Day 2 takes the value from Day 1, which is 25.
  3. If you use Constant fill (e.g., 0): Day 2 becomes 0.

Common mistakes candidates make

  • In-place modification: Not realizing that fillna() returns a new DataFrame unless inplace=True is specified.
  • Incorrect logic: Using the mean for categorical data (which is impossible) or using a constant that doesn't make sense for the column's unit.
  • Data Leakage: In machine learning contexts, calculating the mean of the entire dataset instead of just the training set before filling.

Interview preparation tip

Be ready to discuss the trade-offs of different filling methods. For example, filling with the mean reduces variance, while forward-fill is better for data that changes gradually over time.

Similar Questions