Python Data Manipulation with Pandas Questions

Skills and concepts for extracting, transforming, and preparing tabular and array data in Python using libraries such as pandas and NumPy. Candidates should be comfortable reading data from common formats, working with pandas DataFrame and Series objects, selecting and filtering rows and columns, boolean indexing and query methods, groupby aggregations, sorting, merging and joining dataframes, reshaping data with pivot and melt, handling missing values, and converting and validating data types. Understand NumPy arrays and vectorized operations for efficient numeric computation, when to prefer vectorized approaches over Python loops, and how to write readable, reusable data processing functions. At higher levels, expect questions on memory efficiency, profiling and optimizing slow pandas operations, processing data that does not fit in memory, and designing robust pipelines that handle edge cases and mixed data types.

MediumTechnical

0 practiced

You have to compute month-over-month growth for many metrics in a wide DataFrame where columns are metrics per month (e.g., revenue_2024_01, revenue_2024_02...). Propose a pandas approach to compute percentage growth between consecutive months for each metric and pivot the result to a tidy long format for reporting.

EasyTechnical

0 practiced

Provide a short explanation and code that demonstrates how to use pandas.concat to combine a list of DataFrames with identical schemas. Mention how to handle the index to avoid duplicate indices and how to add a column to track the source file name during concatenation.

EasyTechnical

0 practiced

You have daily sales data and need to create a simple KPI table grouped by product category showing total_sales, average_order_value, number_of_orders, and percentage change week-over-week. Describe pandas steps and code snippets to compute these KPIs and align weeks for percentage change.

EasyTechnical

0 practiced

Write a short pandas snippet that demonstrates how to rename columns to a consistent snake_case convention (e.g., 'Order ID' -> 'order_id', 'OrderDate' -> 'order_date') across a DataFrame. Mention pitfalls when multiple columns map to the same normalized name and how to detect them.

EasyTechnical

0 practiced

Describe benefits of converting string-like low-cardinality columns to pandas categorical dtype. Provide code to convert a column 'country' to category, show how to check memory usage before and after, and mention pitfalls when categories vary across batches of data.

Unlock Full Question Bank

Get access to hundreds of Python Data Manipulation with Pandas interview questions and detailed answers.

Join thousands of developers preparing for their dream job.