Fill Missing Data

Easy

100%

Updated 6/1/2025

Asked by 1 Company

Topics

Fill Missing Data

What is this problem about?

The Fill Missing Data interview question is a practical data engineering task. You are given a dataset (usually a Pandas DataFrame) that contains missing values (NaN or Null) in certain columns. Your task is to fill these gaps using a specific strategy, such as replacing them with a constant value, the mean/median of the column, or using "forward-fill" (taking the last valid value).

Why is this asked in interviews?

Companies like Acko ask this to test your proficiency with Data Preprocessing tools like Pandas. In real-world software engineering and data science, data is rarely clean. Knowing how to handle missing information without introducing bias or breaking the schema is a fundamental skill. It evaluates your knowledge of library APIs and your understanding of data integrity.

Algorithmic pattern used

This follows a Library API usage pattern (specifically Pandas in Python).

Constant Fill: Use df.fillna(value).
Mean/Median Fill: df['col'].fillna(df['col'].mean()).
Forward/Backward Fill: df.fillna(method='ffill') or df.fillna(method='bfill'). The choice of pattern depends on the context of the data (e.g., time-series data usually uses forward-fill).

Example explanation

Suppose you have a table of temperatures:

Day	Temp
1	25
2	NaN
3	27

If you use Mean fill: The mean is (25+27)/2 = 26. Day 2 becomes 26.
If you use Forward fill: Day 2 takes the value from Day 1, which is 25.
If you use Constant fill (e.g., 0): Day 2 becomes 0.

Common mistakes candidates make

In-place modification: Not realizing that fillna() returns a new DataFrame unless inplace=True is specified.
Incorrect logic: Using the mean for categorical data (which is impossible) or using a constant that doesn't make sense for the column's unit.
Data Leakage: In machine learning contexts, calculating the mean of the entire dataset instead of just the training set before filling.

Interview preparation tip

Be ready to discuss the trade-offs of different filling methods. For example, filling with the mean reduces variance, while forward-fill is better for data that changes gradually over time.

Title	Difficulty	Topics	LeetCode
Invalid Tweets	Easy	Database	Solve
1-bit and 2-bit Characters	Easy	Array	Solve
A Number After a Double Reversal	Easy	Math	Solve
Actors and Directors Who Cooperated At Least Three Times	Easy	Database	Solve
Ad-Free Sessions	Easy	Database	Solve

Fill Missing Data

Asked by 1 Company

Topics

Fill Missing Data

What is this problem about?

Why is this asked in interviews?

Algorithmic pattern used

Example explanation

Common mistakes candidates make

Interview preparation tip

Similar Questions