InterviewStack.io LogoInterviewStack.io

Pandas Data Manipulation and Analysis Questions

Data manipulation and analysis using the Pandas library: reading data from CSV or SQL sources, selecting and filtering rows and columns, boolean indexing, iloc and loc usage, groupby aggregations, merging and concatenating DataFrames, handling missing values with dropna and fillna, applying transformations via apply and vectorized operations, reshaping with pivot and melt, and performance considerations for large DataFrames. Includes converting SQL style logic into Pandas workflows for exploratory data analysis and feature engineering.

MediumTechnical
68 practiced
When should you prefer pivot_table over groupby + unstack? Given df with duplicates for some (store,date,product) combinations, write pandas code to create a matrix of summed sales with pivot_table using aggfunc='sum' and fill_value=0. Explain how pivot_table handles duplicates and compare performance.
EasyTechnical
63 practiced
Given a DataFrame df of user profiles with columns ['user_id','age','signup_date','country'], write pandas code to filter users that meet all of: age between 25 and 40 inclusive, signup_date within the year 2023, and country is not null. Provide two approaches: (A) using boolean masks and chained conditions, (B) using DataFrame.query. Discuss readability and performance trade-offs.
EasyTechnical
62 practiced
Explain the differences between df.loc and df.iloc in pandas. Given a DataFrame df with index labels ['a','b','c'] and columns ['x','y','z'], show code to: (1) select rows 'a' through 'c' and columns 'x' and 'z' using label-based indexing; (2) select the first two rows and last column using integer position indexing. Describe inclusive/exclusive behavior of slice endpoints for loc vs iloc.
HardTechnical
57 practiced
High-cardinality string join keys can be slow and memory-hungry. Explain and implement (pandas) an approach to convert string keys to consistent integer codes on both DataFrames before joining. Show code to create and persist the mapping, how to handle unseen keys at join time, and discuss correctness/performance trade-offs.
EasyTechnical
57 practiced
You have two DataFrames: users (user_id, name, email, created_at) and orders (order_id, user_id, amount, created_at). Demonstrate how to perform a left join to attach user info to orders, avoid column name collisions (e.g., both have 'created_at'), and add an indicator column showing whether a match was found during the merge. Provide pandas code examples.

Unlock Full Question Bank

Get access to hundreds of Pandas Data Manipulation and Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.