InterviewStack.io LogoInterviewStack.io

Python Data Manipulation with Pandas Questions

Skills and concepts for extracting, transforming, and preparing tabular and array data in Python using libraries such as pandas and NumPy. Candidates should be comfortable reading data from common formats, working with pandas DataFrame and Series objects, selecting and filtering rows and columns, boolean indexing and query methods, groupby aggregations, sorting, merging and joining dataframes, reshaping data with pivot and melt, handling missing values, and converting and validating data types. Understand NumPy arrays and vectorized operations for efficient numeric computation, when to prefer vectorized approaches over Python loops, and how to write readable, reusable data processing functions. At higher levels, expect questions on memory efficiency, profiling and optimizing slow pandas operations, processing data that does not fit in memory, and designing robust pipelines that handle edge cases and mixed data types.

MediumTechnical
0 practiced
Design a pandas-based approach to compute a customer lifetime value (CLTV) estimate using historical transactions. Describe inputs required, steps to group transactions into recency, frequency, monetary (RFM) features, and how to compute a simple CLTV proxy. Provide example code snippets for RFM computation.
EasyTechnical
0 practiced
Explain how to perform a left join in pandas between transactions and customers DataFrames. transactions has customer_id, amount; customers has customer_id, name, signup_date. Show code to perform the join, keep all transactions, and suffix overlapping column names. Also describe what happens if dtypes of joining keys differ.
MediumTechnical
0 practiced
You must merge event logs with enrollment data where timestamps are at different granularities (enrollment_date has only date, events have datetime). Explain how to join such datasets to compute time-since-enrollment for each event. Provide example pandas code to align types and compute the delta in days, ensuring correct handling of timezone-naive vs timezone-aware datetimes.
EasyTechnical
0 practiced
Describe benefits of converting string-like low-cardinality columns to pandas categorical dtype. Provide code to convert a column 'country' to category, show how to check memory usage before and after, and mention pitfalls when categories vary across batches of data.
EasyTechnical
0 practiced
Using Python and pandas, write code to read a CSV file named 'sales.csv' with columns: order_id (int), order_date (YYYY-MM-DD), customer_id (int), amount (float). Show how you would: a) parse order_date as datetime, b) enforce dtypes for ids, c) handle malformed lines and custom NA tokens, and d) load the file safely if it contains mixed encodings. Explain the key read_csv parameters you chose.

Unlock Full Question Bank

Get access to hundreds of Python Data Manipulation with Pandas interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.