Data Cleaning and Quality Validation in SQL Questions
Handle NULL values, duplicates, and data type issues within queries. Implement data validation checks (row counts, value distributions, date ranges). Practice identifying and documenting data quality issues that impact analysis reliability.
MediumTechnical
0 practiced
Design SQL-based alert rules for critical data quality checks and categorize severity. For example: null_rate(order_date) > 5% = HIGH, row_count drift > 2% = MEDIUM, ingestion lag > 60 minutes = HIGH. Provide sample SQL that computes the current status for these three rules against an orders ingestion metadata table, and describe when an alert should escalate from email to on-call paging.
MediumTechnical
0 practiced
Create a query that computes daily null rate for a column 'event_value' in an events table and then computes a 7-day rolling average null rate per day using window functions. Table:Return columns: event_date, null_rate, rolling_null_rate_7d. Use standard SQL (PostgreSQL syntax ok).
events(event_date DATE, event_value NUMERIC)MediumTechnical
0 practiced
Event timestamps arrive as strings with timezone offsets from multiple producers, e.g., '2024-10-05T13:45:00-07:00' or '2024-10-06 21:00:00 UTC'. Write SQL (Postgres or BigQuery) to parse the timestamp strings and normalize them to TIMESTAMP WITH TIME ZONE (UTC). Also write a query to find rows where parsing fails, for manual inspection.
HardSystem Design
0 practiced
Design a fault-tolerant, scalable SQL-first data quality framework for a cloud data warehouse (e.g., BigQuery or Snowflake) that must run checks across 1000 tables nightly within 2 hours. Describe the architecture (orchestration, storage for results, templates for checks), how checks are defined and parametrized in SQL, how to optimize compute cost, and how to store historical DQ metrics for trending.
EasyTechnical
0 practiced
Given a logs table where duplicates are defined by (user_id, event_type, event_date), design a SQL statement to deduplicate the table, keeping only the row with the most recent updated_at. Table:Provide a DELETE using a window function (PostgreSQL syntax) that removes duplicates while keeping the canonical row.
user_events(id BIGINT, user_id INT, event_type TEXT, event_date DATE, updated_at TIMESTAMP)Unlock Full Question Bank
Get access to hundreds of Data Cleaning and Quality Validation in SQL interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.