InterviewStack.io LogoInterviewStack.io

Scikit Learn, Pandas, and NumPy Usage Questions

Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.

MediumTechnical
65 practiced
Given two numpy arrays A (shape (N, D)) and B (shape (M, D)), implement an efficient function in Python/NumPy to compute the pairwise squared Euclidean distance matrix of shape (N, M) without explicit Python loops. Show the vectorized formula and code.
EasyTechnical
63 practiced
Write a small Python function that accepts a pandas DataFrame `df` and a list of feature columns and returns a numpy 2D array suitable for scikit-learn training. Ensure the function preserves the column order, coerces categorical string columns to numeric codes without introducing NaNs, and documents how missing values are handled.
EasyTechnical
94 practiced
Compare pandas.get_dummies and sklearn.preprocessing.OneHotEncoder. Discuss differences in how they handle unseen categories at inference time, whether they preserve column order/names, and performance trade-offs. Show code snippets illustrating consistent one-hot encoding between training and test sets.
MediumTechnical
61 practiced
Show how pandas.eval and DataFrame.query can speed up and simplify boolean filtering expressions compared to using multiple chained boolean masks. Provide an example where you compute a new column `z = (a + b) / c` and then filter rows with `z > threshold` using both approaches and discuss performance trade-offs.
MediumTechnical
66 practiced
You have timestamps in a pandas Series in various timezones and some as naive timestamps. Describe steps in Python/pandas to convert all timestamps to UTC, handle ambiguous or non-existent times (daylight saving time shifts), and resample to hourly counts of events. Provide code showing pd.to_datetime, tz_localize, tz_convert, and resample.

Unlock Full Question Bank

Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.