InterviewStack.io LogoInterviewStack.io

Data Validation and Anomaly Detection Questions

Techniques for validating data quality and detecting anomalies using SQL: identifying nulls and missing values, finding duplicates and orphan records, range checks, sanity checks across aggregates, distribution checks, outlier detection heuristics, reconciliation queries across systems, and building SQL based alerts and integrity checks. Includes strategies for writing repeatable validation queries, comparing row counts and sums across pipelines, and documenting assumptions for investigative analysis.

EasyTechnical
0 practiced
Write a PostgreSQL query to identify columns in the events table that have more than 5% NULL or missing values for the last 30 days. Table schema:
events(
  event_id bigint PRIMARY KEY,
  user_id bigint,
  event_type text,
  amount numeric,
  occurred_at timestamp
)
Return: column_name, null_count, total_count, null_percentage for events where occurred_at >= current_date - 30.
MediumTechnical
0 practiced
You observe a sudden 20% drop in conversions in the analytics dashboards for yesterday. Provide a structured investigation plan focused on data validation and anomaly detection: initial health checks, SQL queries to isolate dimensions, reconciliation with upstream systems, sampling of raw events, and how to document findings for stakeholders.
MediumTechnical
0 practiced
Write a SQL query that flags days with unusually high transaction counts using a rolling 14-day mean and standard deviation. Table: transactions(transaction_id, occurred_at date). Detect days where count > mean + 3 * stddev computed over the prior 14 days (exclude current day from baseline). Return date, count, rolling_mean, rolling_stddev, z_score, is_spike.
MediumTechnical
0 practiced
Write a SQL/pseudocode approach to detect schema changes between nightly snapshots of table metadata. The check should detect added/removed columns, data type changes, and nullability changes. Describe how you would store schema history and provide example output showing schema diffs for a given table.
HardTechnical
0 practiced
Propose a governance model and a data contract specification that enforces validation checkpoints between producer and consumer teams. Include contract fields (name, type, constraints), versioning policy, compatibility rules, validation tooling, and how to handle contract violations in CI and production.

Unlock Full Question Bank

Get access to hundreds of Data Validation and Anomaly Detection interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.