InterviewStack.io LogoInterviewStack.io

Python Programming & ML Libraries Questions

Python programming language fundamentals (syntax, data structures, control flow, error handling) with practical usage of machine learning libraries such as NumPy, pandas, scikit-learn, TensorFlow, and PyTorch for data manipulation, model development, training, evaluation, and lightweight ML tasks.

HardSystem Design
18 practiced
Design a lightweight A/B testing framework in Python to evaluate two ML models in production. Include components for deterministic bucketing of users, routing logic, logging and instrumentation, statistical significance calculations (t-test, bootstrap), monitoring multiple metrics, and safeguards against optional stopping (peeking). Describe the API and data schema for experiment events.
EasyTechnical
23 practiced
Given the sample DataFrame below, write concise pandas code to detect missing values in columns 'age' and 'income' and impute each missing value with the median of that column grouped by 'occupation'. Explain any assumptions your code makes and how it handles groups with all missing values.
| id | occupation | age | income ||----|------------|------|--------|| 1 | teacher | 35 | 50000 || 2 | engineer | NaN | 80000 || 3 | teacher | 45 | NaN |
Return code that modifies df in-place or returns a new df, and mention dtype considerations.
MediumTechnical
25 practiced
Explain the bias-variance tradeoff. For a supervised ML model, describe what characteristic shapes you'd expect to see on training and validation learning curves for a high-bias model and a high-variance model. Then list practical changes you can implement in scikit-learn or PyTorch to address high variance and high bias respectively.
MediumTechnical
18 practiced
Given time-series data per user, implement rolling-window feature generation that computes mean, std, min, and max over the past 7 days per user using pandas. Show an efficient approach using groupby + rolling that avoids expanding intermediary dataframes unnecessarily and explain index requirements for groupby. Provide code and discuss performance implications.
MediumTechnical
21 practiced
Implement a scikit-learn Pipeline that imputes numeric missing values with median, scales numeric features with StandardScaler, encodes categorical features using OneHotEncoder (handle unknowns), and trains a RandomForestClassifier. Show code using ColumnTransformer and Pipeline given lists numeric_cols and cat_cols. Also include how to validate this pipeline with GridSearchCV.

Unlock Full Question Bank

Get access to hundreds of Python Programming & ML Libraries interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.