Scikit Learn, Pandas, and NumPy Usage Questions
Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.
HardTechnical
0 practiced
Implement a scikit-learn-compatible transformer `TargetEncoder` that performs K-fold target mean encoding with smoothing to reduce overfitting. Requirements:- Uses an internal KFold to compute out-of-fold encodings during fit.- Applies encoding to transform for new data using full-sample statistics computed on fit.- Accepts parameters for smoothing strength and random_state.Provide a code sketch (BaseEstimator + TransformerMixin) and note critical tests you would write.
HardTechnical
0 practiced
Explain how feature selection inside cross-validation can leak information if done incorrectly. Show how to use sklearn Pipeline to include feature selection (e.g., SelectKBest or a model-based selector) safely inside cross-validation so the selection is performed within each fold only.
HardTechnical
0 practiced
You find that a scikit-learn Pipeline you saved with joblib sometimes fails to load on a different machine with a different scikit-learn version. Describe robust strategies for model serialization and versioning (e.g., export model artifacts, save preprocessing code, use wheels or containers). What metadata would you include with the saved artifact to ensure reproducible loading?
MediumTechnical
0 practiced
You have a pandas code block that triggers SettingWithCopyWarning frequently. Explain what this warning means, give examples of code that causes it (e.g., chained indexing), and refactor those examples into safe patterns that avoid the warning and unintended bugs.
MediumTechnical
0 practiced
Explain how to integrate custom scoring metrics into scikit-learn GridSearchCV (e.g., a business metric that combines precision and throughput). Show how to write a scorer using `make_scorer` and how to pass it to GridSearchCV, including maximizing vs minimizing metrics.
Unlock Full Question Bank
Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.