InterviewStack.io LogoInterviewStack.io

Pandas Data Manipulation and Analysis Questions

Data manipulation and analysis using the Pandas library: reading data from CSV or SQL sources, selecting and filtering rows and columns, boolean indexing, iloc and loc usage, groupby aggregations, merging and concatenating DataFrames, handling missing values with dropna and fillna, applying transformations via apply and vectorized operations, reshaping with pivot and melt, and performance considerations for large DataFrames. Includes converting SQL style logic into Pandas workflows for exploratory data analysis and feature engineering.

MediumTechnical
64 practiced
You need to join two large DataFrames where the join key has very high-cardinality and many-to-many relationships. Explain strategies to avoid explosion in row counts: pre-aggregate, use semi-joins or existence checks, or join on hashed keys. Provide pandas examples for a semi-join (filtering left to rows that have matches in right) and discuss trade-offs.
MediumTechnical
56 practiced
Demonstrate how to concatenate a list of many small DataFrames into a single large DataFrame in pandas efficiently. Explain when pd.concat([...], ignore_index=True) is appropriate, how to avoid repeated concat in a loop, and how to use list comprehension to collect frames. Discuss memory spikes and how to use out-of-core techniques if needed.
MediumTechnical
73 practiced
Show how to use numpy vectorization with pandas to compute a new column efficiently: given df with columns 'x' and 'y', compute 'z' = log(x) * sqrt(y) while avoiding invalid values and preserving NaNs. Provide a vectorized solution using numpy/pandas methods and explain why it's faster than row-wise apply.
MediumTechnical
53 practiced
You have a free-text column 'comments' and need to create simple features in pandas: length of comment, number of hashtags, and whether certain keywords appear. Provide vectorized ways (no Python per-row loops) to compute these features using pandas/regex and explain performance considerations on large DataFrames.
MediumTechnical
62 practiced
You have a DataFrame with nested JSON in a column 'payload' (strings of JSON). Using pandas show how to expand the JSON into separate columns (flat) and normalize lists inside the payload into separate rows if needed. Provide examples using json_normalize and .explode, and discuss performance considerations.

Unlock Full Question Bank

Get access to hundreds of Pandas Data Manipulation and Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.