Pandas Data Manipulation and Analysis Questions
Data manipulation and analysis using the Pandas library: reading data from CSV or SQL sources, selecting and filtering rows and columns, boolean indexing, iloc and loc usage, groupby aggregations, merging and concatenating DataFrames, handling missing values with dropna and fillna, applying transformations via apply and vectorized operations, reshaping with pivot and melt, and performance considerations for large DataFrames. Includes converting SQL style logic into Pandas workflows for exploratory data analysis and feature engineering.
EasyTechnical
0 practiced
You find missing values across several columns in DataFrame df: ['user_id','age','income','signup_channel']. Describe and demonstrate with pandas code when to use dropna vs fillna and example strategies: global fill, column-specific fill, forward/backward fill, and imputation using group-wise median. Discuss pros and cons and how to record the imputation performed for downstream reproducibility.
HardTechnical
0 practiced
You have a tight loop in pandas performing heavy numeric computations per row. Explain how to accelerate it using Numba or Cython and describe how to integrate with pandas (e.g., convert columns to numpy arrays, call numba.jit functions, then assign back). Provide a short example where a custom aggregation across rows is sped up using numba.
MediumTechnical
0 practiced
As part of feature engineering, demonstrate how to use pandas' assign, transform, and pipe methods to build a small, readable pipeline that: (1) drops unused columns, (2) creates a new column 'is_active' based on last_login date, (3) encodes a low-cardinality categorical column as codes, and (4) returns the final DataFrame. Explain benefits of using pipe for testability.
EasyTechnical
1 practiced
You get a dataset in a long format: columns ['date','store','product','sales']. Write pandas code to (a) pivot to wide format with stores as rows and products as columns (sales as values), (b) melt it back to long format, and (c) perform a pivot_table with aggregation (sum) and margins. Explain index/columns/values and how to handle duplicates during pivot.
MediumTechnical
0 practiced
You must merge three DataFrames: users, sessions, and events. Sessions contains session_id and user_id; events contains event_id, session_id, event_type. Plan and write pandas code to join them into a single denormalized table containing user-level info, session metadata, and event counts per session. Discuss join order, memory considerations, and when to pre-aggregate events before joining.
Unlock Full Question Bank
Get access to hundreds of Pandas Data Manipulation and Analysis interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.