Data Validation and Anomaly Detection Questions

Techniques for validating data quality and detecting anomalies using SQL: identifying nulls and missing values, finding duplicates and orphan records, range checks, sanity checks across aggregates, distribution checks, outlier detection heuristics, reconciliation queries across systems, and building SQL based alerts and integrity checks. Includes strategies for writing repeatable validation queries, comparing row counts and sums across pipelines, and documenting assumptions for investigative analysis.

EasyTechnical

0 practiced

Describe how z-score based outlier detection works for a numeric column and outline a simple SQL implementation to compute z-scores and flag outliers. State key assumptions behind z-score detection, when it is appropriate, and at least two reasons you might prefer an IQR-based method instead.

MediumBehavioral

0 practiced

Tell me about a time you investigated an unexpected anomaly in a production dashboard. Structure your answer using the STAR method: explain the situation, your task, the actions you took to validate and investigate (including SQL checks), and the result. Emphasize how you documented assumptions and communicated with stakeholders.

MediumTechnical

0 practiced

Write a PostgreSQL query to flag outliers in sales.sale_amount using the IQR method. Compute Q1 and Q3, derive IQR, and then select rows where sale_amount < Q1 - 1.5 * IQR or sale_amount > Q3 + 1.5 * IQR. Also show how you would modify the query to compute IQR per product category and how to handle small partitions with fewer than 30 rows.

MediumTechnical

0 practiced

A nightly validation job joins a 200M-row staging table to a 50M-row dimension and times out. Describe systematic steps to diagnose and optimize this validation query: explain how you would profile the query, check statistics and indexes, rewrite joins or predicates, consider partitioning or clustering, and when to use materialized views or summary tables.

HardSystem Design

0 practiced

Compare threshold-based (rule) anomaly detection and ML-based anomaly detection for production analytics use. Discuss operational overhead, interpretability, maintenance, drift handling, costs, and where each approach is most appropriate. Propose a hybrid strategy that smoothly transitions from rule-based to ML-based detection when justified.

Unlock Full Question Bank

Get access to hundreds of Data Validation and Anomaly Detection interview questions and detailed answers.

Join thousands of developers preparing for their dream job.