Validation and Edge Case Handling Questions
Focuses on validating data correctness and robustness across application and data layers, and on identifying and handling boundary conditions. Topics include input validation and sanitization, server side validation and schema checks, null and missing value behavior, duplicate and cartesian join issues, off by one and boundary testing, date range and type mismatch handling, and test strategies for edge cases. Emphasizes designing systems and queries that fail safely, produce meaningful errors, and include checks that protect aggregations and joins from corrupt or unexpected data.
EasyBehavioral
0 practiced
Tell me about a time you discovered incorrect data that impacted a stakeholder's decision. Use the STAR format (Situation, Task, Action, Result). Explain how you communicated the problem, what remediation you applied, and what process changes you implemented to prevent recurrence.
HardTechnical
0 practiced
Using Great Expectations (or describe equivalent), outline a suite of tests for a daily time-series table `daily_metrics(date, revenue, users, ctr)`. Include tests for freshness, monotonicity where applicable, null tolerances, expected ranges, and unexpected gaps. Describe how you'd surface expectation failures to owners and what auto-remediations (if any) you'd attempt.
EasyTechnical
0 practiced
You have users(user_id) and orders(order_id, user_id) in a system without enforced FKs. Write a SQL query to find order rows whose user_id is missing from the users table (i.e., referential integrity violations). Show a solution that works in standard SQL and explain how you would run this as a scheduled check.
HardSystem Design
0 practiced
Design a scalable data validation and observability pipeline for a modern data warehouse. Requirements: run schema and business tests per pipeline, support historical baselining and anomaly detection, deliver alerts with lineage context, and handle schema evolution. Sketch architecture, components, dataflows, and tools you'd use (open-source or cloud).
HardTechnical
0 practiced
An ETL detects a major schema drift (a new column appears and the type changed) mid-run. Propose a rollback and fail-safe strategy: include canary runs, versioned schemas, automated backfills, readable runbook steps, and stakeholder notification strategy. Explain how you'd ensure no silent data loss while minimizing business disruption.
Unlock Full Question Bank
Get access to hundreds of Validation and Edge Case Handling interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.