Scikit Learn, Pandas, and NumPy Usage Questions
Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.
MediumSystem Design
0 practiced
Design a reproducible training script for an sklearn model that sets random seeds for numpy, python's random, and sklearn-related randomness, logs hyperparameters and metrics, saves the final pipeline with joblib, and writes metadata (git SHA, data snapshot id, package versions) to a JSON file. Sketch the file layout and provide code snippets for seeding and saving artifacts.
EasyTechnical
0 practiced
Using scikit-learn in Python, given features X (2D numpy array) and target y (1D array), write code to split into train/test sets, standardize X with StandardScaler, fit a LinearRegression model, and evaluate test MSE with mean_squared_error. Explain why scaling is applied before fitting and whether it's required for ordinary least squares.
MediumTechnical
0 practiced
Compare SimpleImputer (mean, median, most_frequent), KNNImputer, and IterativeImputer in scikit-learn. For a dataset with mixed numeric and categorical features and approximately 10% MCAR missingness, which strategy would you choose and why? Include a short code snippet showing how to apply KNNImputer on numeric columns within a pipeline.
EasyTechnical
0 practiced
In Python, using pandas and numpy, write concise code to create a DataFrame from arrays user_id=[1,2,3], signup_date=['2025-01-01','2025-01-05','2025-02-10'], value=[10.5,20.0,np.nan]; convert signup_date to datetime, set it as the index, select rows between '2025-01-01' and '2025-01-31', and fill missing values in 'value' with the column mean. Include a short comment explaining each step.
EasyTechnical
0 practiced
Using NumPy, create a 2D array A with values 1..12 and shape (3,4), then reshape it to (4,3). Compute the column-wise mean of the reshaped array and subtract it from the array using broadcasting. Explain why the vectorized subtraction is faster than looping in Python and show minimal timing code.
Unlock Full Question Bank
Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.