Scikit Learn, Pandas, and NumPy Usage Questions
Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.
MediumTechnical
0 practiced
You must preprocess rows pulled from a remote Postgres database but cannot fit the whole table in memory. Demonstrate using pandas.read_sql_query with chunksize to stream rows, apply transformations (filtering, simple feature creation), and write processed chunks to a Parquet file. Discuss transactional consistency and when server-side filtering is preferable.
HardTechnical
0 practiced
Explain numpy.memmap and np.lib.stride_tricks.sliding_window_view. Given a very large 1D signal stored on disk, show how to compute sliding window features of width 100 without loading the entire signal into memory using memmap and sliding_window_view. Provide code and list caveats such as writeability and alignment.
EasyTechnical
0 practiced
You are given a single CSV file of ~10GB with 50 columns. Describe pandas.read_csv strategies and options to load it with limited memory: specify dtypes, usecols, parse_dates, iterator/chunksize, low_memory, and compression handling. Provide short code snippets showing best-practice options and mention trade-offs.
MediumTechnical
0 practiced
Write a custom scikit-learn scorer that computes a weighted F1 where recall for the positive class is weighted more heavily than precision. Show how to wrap it with make_scorer and pass it to GridSearchCV so models are tuned with this custom objective. Explain how this impacts model selection versus optimizing default metrics.
HardTechnical
0 practiced
You have TF-IDF sparse features with 1 million columns. Explain and implement a scikit-learn pipeline that reduces dimensionality while preserving sparsity where possible, for example using TruncatedSVD and a downstream classifier. Discuss memory trade-offs and why standard PCA is not appropriate for sparse input.
Unlock Full Question Bank
Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.