InterviewStack.io LogoInterviewStack.io

Scikit Learn, Pandas, and NumPy Usage Questions

Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.

MediumTechnical
60 practiced
You have a DataFrame with 10 million rows and a column 'text'. A teammate wrote df['tokens'] = df['text'].apply(lambda s: s.split()) and complains it's slow. Explain why this is slow and rewrite it more efficiently using pandas string methods or other vectorized strategies. Provide code and discuss trade-offs (memory vs CPU, readability).
HardTechnical
58 practiced
A classifier's probabilities look poorly calibrated on validation data. Explain how to calibrate probabilities using scikit-learn's CalibratedClassifierCV with methods 'isotonic' and 'sigmoid'. Provide code that wraps an existing estimator, fits calibration, computes calibration_curve, and reports Brier score. Discuss when isotonic calibration may overfit and when to prefer sigmoid.
MediumTechnical
65 practiced
Implement K-fold out-of-fold target encoding for a single categorical column using pandas and scikit-learn KFold. The encoder must avoid leakage by computing the mean target for each category from training folds only. Provide code that produces the encoded column for the training set and describes how to apply the learned mapping to test data safely.
EasyTechnical
118 practiced
Describe the difference between the legacy numpy.random.seed approach and the new numpy.random.Generator API. Show code to create a reproducible Generator, sample normal variates, and explain why the new API is preferred for modern code and parallel workflows.
MediumTechnical
65 practiced
When merging two large DataFrames in pandas, what specific steps can improve merge performance? Discuss using set_index on join keys, converting keys to appropriate dtypes, converting string keys to categorical when cardinality is small, and avoiding unnecessary copies. Provide code examples showing how to prepare keys and call merge efficiently.

Unlock Full Question Bank

Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.