Scikit Learn, Pandas, and NumPy Usage Questions
Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.
MediumTechnical
0 practiced
Design a scikit-learn pipeline using ColumnTransformer to handle a dataset with mixed features: numeric_cols = ['age','income'], categorical_cols = ['region','plan'], with missing values. Pipeline should:- Impute numeric with median and scale- Impute categorical with 'missing' and OneHotEncode (handle_unknown='ignore')- Fit a RandomForestClassifier on processed featuresProvide complete Python code for the pipeline.
EasyTechnical
0 practiced
Given two small pandas DataFrames:left = pd.DataFrame({'id':[1,2,3], 'left_val':[10,20,30]})right = pd.DataFrame({'id':[2,3,4], 'right_val':[200,300,400]})1) Show the results of inner, left, right, and outer merges on 'id'.2) In code, perform a left merge and include an indicator column that shows merge origin. Explain when each merge type is appropriate.
HardTechnical
0 practiced
Implement nested cross-validation for model selection using scikit-learn where inner loop performs GridSearchCV on a pipeline (preprocessing + classifier) and outer loop measures generalization. Provide code using cross_val_score or manual loops. Discuss computational cost and practical shortcuts.
HardTechnical
0 practiced
You have sparse one-hot encoded features producing ~1 million columns. Describe how to represent and train a logistic regression efficiently using sparse CSR matrices in Python. Provide code to convert pandas sparse output to scipy.sparse.csr_matrix, train LogisticRegression with solver='saga', and discuss regularization and memory trade-offs.
MediumTechnical
0 practiced
Discuss trade-offs between one-hot encoding and target encoding for a categorical feature with high cardinality (~10,000 categories). In what situations is target encoding appropriate? How do you avoid target leakage when applying target encoding in cross-validation or production?
Unlock Full Question Bank
Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.