InterviewStack.io LogoInterviewStack.io

Data Quality and Edge Case Handling Questions

Practical skills and best practices for recognizing, preventing, and resolving real world data quality problems and edge cases in queries, analyses, and production data pipelines. Core areas include handling missing and null values, empty and single row result sets, duplicate records and deduplication strategies, outliers and distributional assumptions, data type mismatches and inconsistent formatting, canonicalization and normalization of identifiers and addresses, time zone and daylight saving time handling, null propagation in joins, and guarding against division by zero and other runtime anomalies. It also covers merging partial or inconsistent records from multiple sources, attribution and aggregation edge cases, group by and window function corner cases, performance and correctness trade offs at scale, designing robust queries and pipeline validations, implementing sanity checks and test datasets, and documenting data limitations and assumptions. At senior levels this expands to proactively designing automated data quality checks, monitoring and alerting for anomalies, defining remediation workflows, communicating trade offs to stakeholders, and balancing engineering effort against business risk.

HardTechnical
0 practiced
You detect a long-running float rounding bug that caused cumulative revenue to drift over months and impacted billing. Create a remediation plan: quantify total customer impact, design deterministic fixes (store amounts as integers or decimals), reprocess historical data safely, notify stakeholders, and propose monitoring to prevent recurrence.
MediumTechnical
0 practiced
You find customer IDs are inconsistent: some sources include leading zeros, some strip them, and some use numeric vs string types. Propose a normalization and migration strategy that creates canonical IDs without breaking downstream consumers, including detection methods for affected datasets and a rollback plan.
MediumTechnical
0 practiced
Write a SQL monitoring query that flags daily sign-up surges: identify days where new signups exceed the rolling 7-day mean by more than 4 standard deviations. Table: users(signup_date). Explain how you would choose the threshold and adjust for seasonality or known campaigns.
HardTechnical
0 practiced
Multiple sources report the same monetary metric with small discrepancies due to rounding and window boundaries. Design a reconciler that attributes differences by cause (rounding, late arrivals, attribution lag), reports probable causes with confidence scores, and decides when automated reconciliation is safe vs when to escalate to manual review.
HardSystem Design
0 practiced
Design a versioned canonicalization microservice responsible for normalizing emails, phone numbers, and addresses used across many products. Describe API design, versioning strategy, backward compatibility guarantees, caching, latency SLA targets, rate limiting, and a migration plan for clients.

Unlock Full Question Bank

Get access to hundreds of Data Quality and Edge Case Handling interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.