InterviewStack.io LogoInterviewStack.io

Data Cleaning and Quality Validation in SQL Questions

Handle NULL values, duplicates, and data type issues within queries. Implement data validation checks (row counts, value distributions, date ranges). Practice identifying and documenting data quality issues that impact analysis reliability.

MediumSystem Design
0 practiced
Design a SQL-first data validation layer for an ETL pipeline that runs daily. Requirements: (1) validate schema and null thresholds, (2) check business invariants (e.g., sums by partition match source), (3) fail-fast for critical checks, (4) support historical audit logs. Describe components, where validations run (source vs. warehouse), tooling choices (dbt, Great Expectations, custom SQL), and retry/backfill behavior.
EasyTechnical
0 practiced
Given a 'orders' table:
| order_id INT | user_id INT | order_amount DECIMAL | created_at TIMESTAMP |
Write an SQL query that identifies duplicate orders based on the business key (user_id, order_amount, DATE(created_at)). Return groups with count > 1 and include sample order_ids for each duplicate group. Assume Postgres-compatible SQL.
HardTechnical
0 practiced
Case: A production model's accuracy decreased by 12% coincident with a data schema change. You have SQL access to historical training and scoring tables. Outline a step-by-step SQL-based investigation to quantify which feature distributions changed, identify rows that no longer match training distributions, and propose mitigation steps to restore model performance.
MediumTechnical
0 practiced
You ingest a 'status' column from multiple sources and find inconsistent enum values like 'active', 'Active', 'ACT', '1'. Write SQL to join the dataset to a canonical_lookup table to normalize status codes, and produce a report of unmapped values and their counts to help expand the lookup table.
EasyTechnical
0 practiced
Given an 'events' table with 'event_id', 'user_id', and 'event_date' stored as STRING, write SQL to: (1) find rows with malformed dates or impossible dates (e.g., 2099-01-01), (2) convert valid string dates to DATE safely, and (3) count rows per error type. Assume a SQL dialect with TRY_CAST or SAFE_CAST available; explain alternatives if not available.

Unlock Full Question Bank

Get access to hundreds of Data Cleaning and Quality Validation in SQL interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.