InterviewStack.io LogoInterviewStack.io

Data Quality and Governance Questions

Covers the principles, frameworks, practices, and tooling used to ensure data is accurate, complete, timely, and trustworthy across systems and pipelines. Key areas include data quality checks and monitoring such as nullness and type checks, freshness and timeliness validation, referential integrity, deduplication, outlier detection, reconciliation, and automated alerting. Includes design of service level agreements for data freshness and accuracy, data lineage and impact analysis, metadata and catalog management, data classification, access controls, and compliance policies. Encompasses operational reliability of data systems including failure handling, recovery time objectives, backup and disaster recovery strategies, observability and incident response for data anomalies. Also covers domain and system specific considerations such as customer relationship management and sales systems: common causes of data problems, prevention strategies like input validation rules, canonicalization, deduplication and training, and business impact on forecasting and operations. Candidates may be evaluated on designing end to end data quality programs, selecting metrics and tooling, defining roles and stewardship, and implementing automated pipelines and governance controls.

MediumTechnical
0 practiced
A source system schema changed and an ETL job started failing silently producing NULLs in a key column for the last 3 days. Outline your debugging steps as an analyst: how you would detect, quantify impact, notify stakeholders, and recommend short-term and long-term fixes.
HardTechnical
0 practiced
Describe a scalable approach to probabilistic duplicate detection across multiple customer datasets (CRM, orders, support tickets) that uses blocking and Locality-Sensitive Hashing (LSH). Explain how you would tune blocking keys, choose similarity thresholds, and handle false merges.
MediumSystem Design
0 practiced
Design a strategy to manage and version data quality rules and tests (e.g., in dbt or a tests-as-code framework) so analysts can propose and review rule changes. Include branching or approval workflows, testing environments, and how to handle rule deprecation.
HardSystem Design
0 practiced
Design an automated root cause analysis (RCA) system for data quality alerts that leverages logs, metric anomalies, and lineage graph signals to prioritize likely causes. Describe input signals, scoring heuristics, and how analysts would validate the suggested root causes.
MediumTechnical
0 practiced
You're evaluating data quality tooling for automated checks: Great Expectations, Monte Carlo, and custom SQL jobs. As the data analyst lead, list five evaluation criteria you would use and rank them from most to least important with a one-sentence justification each.

Unlock Full Question Bank

Get access to hundreds of Data Quality and Governance interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.