InterviewStack.io LogoInterviewStack.io

Data Quality and Governance Questions

Covers the principles, frameworks, practices, and tooling used to ensure data is accurate, complete, timely, and trustworthy across systems and pipelines. Key areas include data quality checks and monitoring such as nullness and type checks, freshness and timeliness validation, referential integrity, deduplication, outlier detection, reconciliation, and automated alerting. Includes design of service level agreements for data freshness and accuracy, data lineage and impact analysis, metadata and catalog management, data classification, access controls, and compliance policies. Encompasses operational reliability of data systems including failure handling, recovery time objectives, backup and disaster recovery strategies, observability and incident response for data anomalies. Also covers domain and system specific considerations such as customer relationship management and sales systems: common causes of data problems, prevention strategies like input validation rules, canonicalization, deduplication and training, and business impact on forecasting and operations. Candidates may be evaluated on designing end to end data quality programs, selecting metrics and tooling, defining roles and stewardship, and implementing automated pipelines and governance controls.

HardTechnical
0 practiced
You are asked to design SQL-based unit tests for transformation logic that computes weekly retention cohorts. Describe 6 test cases (e.g., empty input, single user multiple weeks, duplicate events, timezone edge-case, late arrival, invalid signup timestamp) and the expected outcomes for each.
HardTechnical
0 practiced
You observe subtle data drift in feature distributions used by multiple downstream dashboards (e.g., average session length slowly increases). Describe statistical techniques to detect and quantify drift (e.g., KL divergence, population stability index, two-sample tests) and how to implement them in production monitoring.
EasyTechnical
0 practiced
A dashboard needs daily sales numbers available by 07:00 local time. As a data analyst, outline what a data freshness SLA would look like for this metric and list three concrete monitoring checks you would put in place to ensure the SLA is met.
MediumSystem Design
0 practiced
Design a simple monitoring dashboard (list the panels) for tracking data quality of a daily ETL feed into a sales reporting table. Include metrics for volume, freshness, schema drift, null rate, and error rate. For each panel state what alert threshold you would set and why.
MediumTechnical
0 practiced
You're evaluating schema evolution strategies for a data warehouse (additive columns, type changes, column renames). As a data analyst, propose recommended conventions and a change-management process that minimizes breakage for downstream analysts and dashboards.

Unlock Full Question Bank

Get access to hundreds of Data Quality and Governance interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.