InterviewStack.io LogoInterviewStack.io

Data Cleaning and Quality Validation in SQL Questions

Handle NULL values, duplicates, and data type issues within queries. Implement data validation checks (row counts, value distributions, date ranges). Practice identifying and documenting data quality issues that impact analysis reliability.

HardTechnical
0 practiced
Design an SLA-based alerting and runbook system for data quality failures. Include (1) how to pick thresholds to trigger alerts, (2) how to avoid alert fatigue, (3) sample SQL checks for critical metrics, and (4) automated remediation patterns for trivial failures (e.g., transient downstream ingestion lag).
MediumTechnical
0 practiced
Write SQL to detect malformed email addresses in a 'users' table and return counts by top 10 email domains of malformed emails. Assume the table has columns: user_id, email. Use a regex compatible with your SQL dialect and explain trade-offs between strictness and false positives.
HardSystem Design
0 practiced
Design a validation architecture for streaming data that flows from Kafka to a data warehouse. Requirements: near-real-time deduplication, detection of schema drift, handling late-arriving events, and alerting. Describe components (stream processors, schema registry, validation service), what checks run in-stream vs in-batch, and trade-offs between latency and validation depth.
HardTechnical
0 practiced
As a senior data scientist, you need to convince data engineering to add source-side constraints (e.g., NOT NULL on critical fields). Draft a concise SQL-backed argument with sample queries that quantify current errors, estimate business impact, and propose an incremental rollout plan to minimize producer disruption.
MediumTechnical
0 practiced
You have 'source_events' and 'warehouse_events' tables. Build a SQL query that performs a reconciliation check by date and event_type: compute source_count, warehouse_count, difference, and percent_difference, and flag rows where absolute percent_difference > 1%. Provide the query and explain how you would schedule and alert on such mismatches in a production pipeline.

Unlock Full Question Bank

Get access to hundreds of Data Cleaning and Quality Validation in SQL interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.