InterviewStack.io LogoInterviewStack.io

Validation and Edge Case Handling Questions

Focuses on validating data correctness and robustness across application and data layers, and on identifying and handling boundary conditions. Topics include input validation and sanitization, server side validation and schema checks, null and missing value behavior, duplicate and cartesian join issues, off by one and boundary testing, date range and type mismatch handling, and test strategies for edge cases. Emphasizes designing systems and queries that fail safely, produce meaningful errors, and include checks that protect aggregations and joins from corrupt or unexpected data.

HardTechnical
124 practiced
Using Great Expectations (or describe equivalent), outline a suite of tests for a daily time-series table `daily_metrics(date, revenue, users, ctr)`. Include tests for freshness, monotonicity where applicable, null tolerances, expected ranges, and unexpected gaps. Describe how you'd surface expectation failures to owners and what auto-remediations (if any) you'd attempt.
HardTechnical
90 practiced
You must normalize and deduplicate addresses prior to customer matching. Describe a repeatable normalization pipeline (e.g., expand abbreviations, remove punctuation, canonicalize casing), what external resources (postal libraries, regex rules) you'd use, and a SQL/Python example of normalizing a single address field.
EasyTechnical
66 practiced
You have users(user_id) and orders(order_id, user_id) in a system without enforced FKs. Write a SQL query to find order rows whose user_id is missing from the users table (i.e., referential integrity violations). Show a solution that works in standard SQL and explain how you would run this as a scheduled check.
HardSystem Design
95 practiced
Architect a fail-safe strategy for near-real-time analytics where events may arrive late or out of order. Requirements: minimize double counting, allow incremental updates, enable reprocessing windows, and give stakeholders confidence in near-real-time KPIs. Sketch components, policies for watermarking, and mechanisms to correct aggregates when late data arrives.
HardSystem Design
83 practiced
Design an alerting strategy that uses data lineage to prioritize failures. For example, an assertion failed on an upstream table used by 10 downstream dashboards. How would you surface priority, route alerts to the right teams, and avoid alert fatigue? Describe the components and rules you would implement.

Unlock Full Question Bank

Get access to hundreds of Validation and Edge Case Handling interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.