InterviewStack.io LogoInterviewStack.io

Python Data Manipulation with Pandas Questions

Skills and concepts for extracting, transforming, and preparing tabular and array data in Python using libraries such as pandas and NumPy. Candidates should be comfortable reading data from common formats, working with pandas DataFrame and Series objects, selecting and filtering rows and columns, boolean indexing and query methods, groupby aggregations, sorting, merging and joining dataframes, reshaping data with pivot and melt, handling missing values, and converting and validating data types. Understand NumPy arrays and vectorized operations for efficient numeric computation, when to prefer vectorized approaches over Python loops, and how to write readable, reusable data processing functions. At higher levels, expect questions on memory efficiency, profiling and optimizing slow pandas operations, processing data that does not fit in memory, and designing robust pipelines that handle edge cases and mixed data types.

MediumTechnical
76 practiced
Given a transaction DataFrame with columns user_id, amount, and date, write pandas code to compute per-user count, mean, median, and 95th percentile in a single groupby operation returning a flattened result with clear column names. Implement it in Python and explain your choice of aggregation functions.
EasyTechnical
77 practiced
You have a DataFrame of sensor readings with many missing values across columns. Describe strategies in Python pandas to handle missing data for exploratory analysis and for preparing features for a model. Include examples for dropna, fillna, interpolation, forward fill, and groupwise imputation in code.
MediumTechnical
108 practiced
Given clickstream data per user, write efficient pandas code to compute for each user the rolling average of pageviews over their last 7 events. Explain why groupby.apply with a Python function may be slow and show how to use groupby.rolling or shift-based vectorized computations instead.
EasyTechnical
68 practiced
Write pandas code to deduplicate a DataFrame of user events by keeping the most recent event per user based on event_timestamp. Provide two approaches: sorting plus drop_duplicates, and using groupby with idxmax. Explain pros and cons of each in terms of readability and performance.
MediumTechnical
84 practiced
Describe how to reshape a long DataFrame into a wide format using pivot_table and then revert it back to long using melt. Include an example where you aggregate duplicate index/value pairs and specify fill values for missing combinations.

Unlock Full Question Bank

Get access to hundreds of Python Data Manipulation with Pandas interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.