InterviewStack.io LogoInterviewStack.io

Data Cleaning and Business Logic Edge Cases Questions

Covers handling data centric edge cases and complex business rule interactions in queries and data pipelines. Topics include cleaning and normalizing data, handling nulls and type mismatches, deduplication strategies, treating inconsistent or malformed records, validating results and detecting anomalies, using conditional logic for data transformation, understanding null semantics in SQL, and designing queries that correctly implement date boundaries and domain specific business rules. Emphasis is on producing robust results in the presence of imperfect data and complex requirements.

HardSystem Design
0 practiced
Design an idempotent, low-downtime backfill strategy for a partitioned data warehouse table that contains billions of rows and warms many downstream dashboards. Include partition-level approaches, staging tables, validation checksums, and a rollback plan to minimize consumer impact while ensuring correctness.
EasyTechnical
0 practiced
Write a SQL query to locate rows in a transactions table(transaction_id, user_id, amount, created_at) where numeric columns are outside expected ranges (e.g., negative amounts, amounts > 1,000,000) or are NULL unexpectedly. Describe a simple automation to run nightly that flags such anomalies and notifies the data team.
HardTechnical
0 practiced
Implement SQL or pseudocode to compute a retention metric where a user is considered 'retained' in a cohort if they performed event A at least once OR they performed at least two events from set {B, C} within 7 days of cohort creation. Account for users across timezones given event timestamps in UTC and for late-arriving events within a 48-hour allowance.
MediumTechnical
0 practiced
You need to match product SKUs between two systems where SKUs have inconsistent formatting: dashes, leading zeros, and small typos. Explain a practical approach for matching including normalization, tokenization, fuzzy matching algorithms (Levenshtein, Jaro-Winkler), blocking strategies to reduce comparisons, and how you'd evaluate precision and recall for the matching.
EasyTechnical
0 practiced
A product manager requests weekly active users (WAU) using Monday-to-Sunday weeks. Write a SQL snippet (ANSI SQL) that maps event timestamps to their corresponding ISO week start date or to a Monday-based week bucket for analytics, and explain how you would handle events at midnight and time zone differences when the underlying event_time is stored in UTC.

Unlock Full Question Bank

Get access to hundreds of Data Cleaning and Business Logic Edge Cases interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.