Pandas Data Manipulation and Analysis Questions

Data manipulation and analysis using the Pandas library: reading data from CSV or SQL sources, selecting and filtering rows and columns, boolean indexing, iloc and loc usage, groupby aggregations, merging and concatenating DataFrames, handling missing values with dropna and fillna, applying transformations via apply and vectorized operations, reshaping with pivot and melt, and performance considerations for large DataFrames. Includes converting SQL style logic into Pandas workflows for exploratory data analysis and feature engineering.

HardSystem Design

0 practiced

Design an automated weekly KPI reporting pipeline that reads source data, uses pandas for incremental aggregation, and publishes results to Power BI or writes Parquet for downstream use. Include scheduling, incremental update logic, testing, monitoring, and how to handle schema drift. Provide a high-level architecture and an example snippet that updates aggregates incrementally.

MediumTechnical

0 practiced

Explain how converting low-cardinality string columns to pandas.Categorical reduces memory usage and can speed up groupby/merge operations. Provide code that measures memory_usage(deep=True) before and after converting a 'status' column with 5 unique values in a 5 million row DataFrame and show the percentage reduction.

MediumTechnical

0 practiced

Demonstrate how to use DataFrame.eval and DataFrame.query to compute and filter expressions. Given df with columns a, b, c compute d = a * b + c using eval and then filter rows where d > 100 using query. Explain when eval/query may improve performance and when they may not help.

HardTechnical

0 practiced

How would you unit-test pandas transformation functions used in your data pipeline? Provide pytest examples that validate schema (expected columns and dtypes), value ranges, null handling, and invariants. Show how to create small synthetic DataFrames as fixtures and how to assert DataFrame equality robustly.

EasyTechnical

0 practiced

You receive a CSV file 'sales.csv' with columns: order_id, customer_id, order_date, product_id, quantity, price. As a data analyst using Python and pandas, write code to read the CSV while parsing order_date as datetime, trimming extra whitespace, handling inconsistent delimiters, dropping rows missing product_id, and computing a new column total_price = quantity * price. Explain each step and show the first 5 rows of the resulting DataFrame.

Unlock Full Question Bank

Get access to hundreds of Pandas Data Manipulation and Analysis interview questions and detailed answers.

Join thousands of developers preparing for their dream job.