Scikit Learn, Pandas, and NumPy Usage Questions

Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.

HardTechnical

0 practiced

Describe how to reliably serialize a scikit-learn Pipeline that contains custom transformer classes and third-party objects so it can be loaded on a different machine for production. Discuss pitfalls of pickle/joblib, define best practices (avoid lambdas, ensure import path for classes, include environment dependencies), and explain when converting to ONNX is advantageous.

EasyTechnical

0 practiced

Implement a Python function using NumPy that takes two 1-D arrays a and b of shape (n,) and returns a boolean array indicating whether a is greater than b + tol element-wise. Do this without Python loops, using broadcasting and vectorized operations. Include handling for NaN values so that comparisons with NaN yield False.

MediumTechnical

0 practiced

Describe what a NumPy universal function (ufunc) is and how to write a simple vectorized ufunc using numpy.frompyfunc or np.vectorize. Explain why np.vectorize generally does not give performance benefits and when to use numba or write a C-extension for true speed gains. Provide a small example and describe limitations.

MediumTechnical

0 practiced

Using pandas, implement a per-user 7-day rolling mean for irregular time series. DataFrame has ['user_id','timestamp','metric'] and timestamps are uneven. Provide code that computes the 7-day lookback rolling mean per user aligned to each timestamp and handles users with sparse data.

EasyTechnical

0 practiced

Describe the difference between the legacy numpy.random.seed approach and the new numpy.random.Generator API. Show code to create a reproducible Generator, sample normal variates, and explain why the new API is preferred for modern code and parallel workflows.

Unlock Full Question Bank

Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.

Join thousands of developers preparing for their dream job.