Data Cleaning & Handling Missing Values Questions
Understand common data quality issues: missing values (NaN, null), duplicates, outliers, inconsistent formats, and incorrect data types. Know strategies for handling each: removing rows/columns with missing data, imputation (mean, median, forward fill), deduplication, type conversion, and validation checks. Understand the trade-offs of each approach.
HardTechnical
92 practiced
Outline statistical tests and diagnostics you would run to distinguish MCAR from MAR in practice. Discuss Little's MCAR test, logistic regression of missingness indicators on observed covariates, and limitations and interpretation of these tests in operational datasets.
MediumTechnical
87 practiced
For time-series forecasting with irregular timestamps and missing intervals, when would you choose interpolation (linear, spline) vs model-based imputation such as Kalman smoothing or state-space models? Give examples and discuss trade-offs in bias, uncertainty estimation, and computational cost.
MediumTechnical
91 practiced
Implement a scikit-learn Pipeline in Python that imputes numeric features with median, imputes categorical features with most frequent value, one-hot encodes categoricals with handle_unknown='ignore', and finally trains a RandomForestClassifier. Provide a function build_pipeline(numeric_cols, categorical_cols) that returns the fitted pipeline when given training X and y.
MediumTechnical
65 practiced
Provide Python code to (a) plot a heatmap of missingness across features using seaborn/matplotlib and (b) compute a missingness-correlation matrix (correlation between indicators of missingness for each pair of features). Explain what strong blocks or correlations imply and next steps.
MediumTechnical
84 practiced
You receive multiple CSVs with different null tokens ('', 'NA', 'n/a', '.', '-999'). Propose a standardized ingest strategy to normalize missing representations across sources, including how to discover unknown tokens, preserve provenance of original values, and avoid incorrectly normalizing legitimate sentinel values.
Unlock Full Question Bank
Get access to hundreds of Data Cleaning & Handling Missing Values interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.