InterviewStack.io LogoInterviewStack.io

Python Data Manipulation with Pandas Questions

Skills and concepts for extracting, transforming, and preparing tabular and array data in Python using libraries such as pandas and NumPy. Candidates should be comfortable reading data from common formats, working with pandas DataFrame and Series objects, selecting and filtering rows and columns, boolean indexing and query methods, groupby aggregations, sorting, merging and joining dataframes, reshaping data with pivot and melt, handling missing values, and converting and validating data types. Understand NumPy arrays and vectorized operations for efficient numeric computation, when to prefer vectorized approaches over Python loops, and how to write readable, reusable data processing functions. At higher levels, expect questions on memory efficiency, profiling and optimizing slow pandas operations, processing data that does not fit in memory, and designing robust pipelines that handle edge cases and mixed data types.

EasyTechnical
108 practiced
Describe common techniques to detect and handle missing values in pandas. Give examples using dropna with thresholds, fillna with column-specific strategies, forward-fill/backward-fill within groups, and interpolation for time series. Include sample code for group-wise forward-fill limited to 1 consecutive NaN.
HardSystem Design
64 practiced
Describe how you would implement observability for production pandas ETL jobs: list key metrics to emit (e.g., input rows, processed rows, null-rate per column, processing latency, memory usage), logging best practices, alerting thresholds, and how to surface validation failures to engineers. Give examples of tools and integration points.
HardTechnical
83 practiced
You're ingesting CSVs that sometimes contain malformed rows: missing delimiters, stray quotes, and inconsistent headers. Design a robust pandas-based reader that can detect bad rows, log them to a quarantine file with line numbers, attempt best-effort parsing, and continue processing. Describe heuristics to detect schema drift and when to fail-fast.
EasyTechnical
79 practiced
Given a dataset in long form with columns ['date','product','region','sales'], write pandas code to create a wide table where each product becomes a column and each row is a date-region pair, filling missing cells with 0. Then demonstrate how to melt that wide table back to long form. Mention pivot vs pivot_table and aggregation behavior when duplicates exist.
MediumTechnical
81 practiced
Write a reusable Python function (pandas) named 'apply_schema(df, schema)' that takes a DataFrame and a schema dict mapping column names to desired dtypes (e.g., {'user_id':'Int64','amount':'float32','signup_date':'datetime64[ns]'}). The function should attempt conversions, collect and return a report of columns that failed, and avoid raising on the first error. Describe how you would unit test this function.

Unlock Full Question Bank

Get access to hundreds of Python Data Manipulation with Pandas interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.