InterviewStack.io LogoInterviewStack.io

Scikit Learn, Pandas, and NumPy Usage Questions

Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.

HardTechnical
63 practiced
Explain nested cross-validation and why it provides an unbiased estimate of generalization for hyperparameter selection. Using scikit-learn, provide code to run nested CV comparing RandomForestClassifier and LogisticRegression where an inner GridSearchCV tunes hyperparameters and an outer cross_val_score reports the final distribution of scores.
EasyTechnical
54 practiced
Write Python code using scikit-learn to construct a Pipeline that scales numeric features with StandardScaler, fits a LogisticRegression with L2 penalty, and uses cross_val_score to evaluate ROC AUC. Explain why using a Pipeline is important when performing cross-validation and how it prevents data leakage.
EasyTechnical
55 practiced
Explain differences between NumPy slicing (views) and advanced integer indexing (copies). Given a = np.arange(12).reshape(3,4), demonstrate a[1:3,:], a[[0,2],[1,3]], and show how assignment to these views or copies affects the original array. Provide code and explain when a copy is created.
EasyTechnical
79 practiced
In Python, using pandas and numpy, write concise code to create a DataFrame from arrays user_id=[1,2,3], signup_date=['2025-01-01','2025-01-05','2025-02-10'], value=[10.5,20.0,np.nan]; convert signup_date to datetime, set it as the index, select rows between '2025-01-01' and '2025-01-31', and fill missing values in 'value' with the column mean. Include a short comment explaining each step.
MediumSystem Design
74 practiced
Design a reproducible training script for an sklearn model that sets random seeds for numpy, python's random, and sklearn-related randomness, logs hyperparameters and metrics, saves the final pipeline with joblib, and writes metadata (git SHA, data snapshot id, package versions) to a JSON file. Sketch the file layout and provide code snippets for seeding and saving artifacts.

Unlock Full Question Bank

Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.