InterviewStack.io LogoInterviewStack.io

Python for Data Analysis Questions

Covers the practical use of Python and its data libraries to perform data ingestion, cleaning, transformation, analysis, and aggregation. Candidates should be able to manipulate data frames, perform complex grouping and aggregation operations, merge and join multiple data sources, and implement efficient vectorized operations using libraries such as Pandas and NumPy. Expect to write clear, idiomatic Python with appropriate error handling, input validation, and small tests or assertions. At more senior levels, discuss performance trade offs and scalability strategies such as choosing NumPy vectorization versus Pandas, and when to adopt alternative tools like Polars or Dask for very large datasets, as well as techniques for memory management, profiling, and incremental or streaming processing. Also cover reproducibility, serialization formats, and integrating analysis into pipelines.

EasyTechnical
0 practiced
Given a pandas DataFrame with both numeric and categorical columns, write Python code that fills missing numeric values with the column median and fills categorical columns with the mode. Include handling for columns that are all NaNs and avoid modifying the original DataFrame in-place.
MediumTechnical
0 practiced
Implement in Python an aggregation that, for each cohort (string column 'cohort_week'), computes the proportion of unique users who performed a target action at least once. Input is an events DataFrame with columns ['user_id', 'cohort_week', 'action']. Provide code using groupby and explain how to avoid double-counting duplicate events.
HardTechnical
0 practiced
You are mentoring a junior analyst who frequently writes pandas scripts that fail due to loading entire large files into memory. Create a short teaching plan (several exercises) that demonstrates chunked reading, vectorization vs apply, and basic profiling with concrete coding tasks and expected solutions. Describe how you would measure learning outcomes.
HardTechnical
0 practiced
Discuss strategies to encode very high-cardinality categorical features (millions of unique user IDs) for feature engineering or reporting in Python. Cover hashing-based approaches, frequency/binning, target encoding with regularization, and storage/access patterns for mapping tables. Discuss interpretability and memory trade-offs.
EasyTechnical
0 practiced
Describe the difference between .loc and .iloc in pandas with code examples that demonstrate label-based vs integer-position based selection, including examples of slicing that show inclusive/exclusive behaviour. Mention common gotchas.

Unlock Full Question Bank

Get access to hundreds of Python for Data Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.