The Fill Missing Data interview question is a practical data engineering task. You are given a dataset (usually a Pandas DataFrame) that contains missing values (NaN or Null) in certain columns. Your task is to fill these gaps using a specific strategy, such as replacing them with a constant value, the mean/median of the column, or using "forward-fill" (taking the last valid value).
Companies like Acko ask this to test your proficiency with Data Preprocessing tools like Pandas. In real-world software engineering and data science, data is rarely clean. Knowing how to handle missing information without introducing bias or breaking the schema is a fundamental skill. It evaluates your knowledge of library APIs and your understanding of data integrity.
This follows a Library API usage pattern (specifically Pandas in Python).
df.fillna(value).df['col'].fillna(df['col'].mean()).df.fillna(method='ffill') or df.fillna(method='bfill').
The choice of pattern depends on the context of the data (e.g., time-series data usually uses forward-fill).Suppose you have a table of temperatures:
| Day | Temp |
|---|---|
| 1 | 25 |
| 2 | NaN |
| 3 | 27 |
(25+27)/2 = 26. Day 2 becomes 26.fillna() returns a new DataFrame unless inplace=True is specified.Be ready to discuss the trade-offs of different filling methods. For example, filling with the mean reduces variance, while forward-fill is better for data that changes gradually over time.