InterviewStack.io LogoInterviewStack.io

Data Transformation and Preparation Questions

Focuses on the technical skills and judgement required to connect to data sources, clean and shape data, and prepare datasets for analysis and visualization. Includes identifying necessary transformations such as calculations, aggregations, filtering, joins, and type conversions; deciding whether to perform transformations in the business intelligence tool or in the data warehouse or database layer; designing efficient data models and extract transform load workflows; ensuring data quality, lineage, and freshness; applying performance optimization techniques such as incremental refresh and pushdown processing; and familiarity with tools and features such as Power BI Power Query, Tableau data preparation capabilities, and structured query language for database level transformations. Also covers documentation, reproducibility, and testing of data preparation pipelines.

MediumTechnical
88 practiced
An upstream system changed a column type (int -> varchar) and multiple production dashboards started failing. Describe immediate triage steps to restore reporting (rollback, casting, quick fixes), long-term preventative measures (schema contracts, integration tests, CI checks), and the communication plan you'd use to inform stakeholders during the incident.
EasyTechnical
65 practiced
You have a table orders(order_id int, order_date_str varchar, amount_str varchar). Sample rows:
order_id | order_date_str | amount_str
1 | '2024-01-05' | '$1,234.56'
2 | '2024/02/10' | '1234.00'
Write a PostgreSQL (or ANSI SQL) snippet that: 1) parses order_date_str into a DATE handling 'YYYY-MM-DD' and 'YYYY/MM/DD'; 2) converts amount_str to NUMERIC by removing currency symbols and commas; 3) outputs order_id, order_date, amount_numeric, and month. Explain how you handle invalid formats.
EasyTechnical
70 practiced
Explain how NULL behaves in SQL comparisons and aggregations. Contrast NULL with empty string ('') and zero (0) in string, numeric, and date contexts. Describe three common pitfalls when joining or aggregating data that involves NULLs and give recommended handling strategies (COALESCE, filtering, explicit NULL checks).
EasyTechnical
89 practiced
A stakeholder reports counts differ after joining orders and refunds: orders table has 10,000 rows, refunds 200, but the join produces 10,500 rows. Describe step-by-step how you'd debug this in SQL: what checks, sample queries, and assumptions you would validate, plus how you would explain the root cause to the stakeholder in plain language.
EasyTechnical
93 practiced
Given table user_events(user_id int, event_type varchar, event_ts timestamp), write a SQL query (PostgreSQL/ANSI) that removes duplicates defined as same user_id and event_type occurring within a 1-minute window, keeping the earliest event_ts for each group. Explain how your query handles boundary conditions.

Unlock Full Question Bank

Get access to hundreds of Data Transformation and Preparation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.