InterviewStack.io LogoInterviewStack.io

Data Quality and Validation Questions

Covers the core concepts and hands on techniques for detecting, diagnosing, and preventing data quality problems. Topics include common data issues such as missing values, duplicates, outliers, incorrect labels, inconsistent formats, schema mismatches, referential integrity violations, and distribution or temporal drift. Candidates should be able to design and implement validation checks and data profiling queries, including schema validation, column level constraints, aggregate checks, distinct counts, null and outlier detection, and business logic tests. This topic also covers the mindset of data validation and exploration: how to approach unfamiliar datasets, validate calculations against sources, document quality rules, decide remediation strategies such as imputation quarantine or alerting, and communicate data limitations to stakeholders.

HardSystem Design
41 practiced
Design a scalable, unified data-quality framework to validate both batch and streaming financial inputs used in forecasting and near-real-time dashboards. Describe architecture components, where checks run, how to store results, alerting strategy, and how engineers and analysts consume the outcomes.
MediumTechnical
36 practiced
Write a SQL query to detect rows with inconsistent currency formatting in a payments table (payments(amount, currency_code, amount_str)). The amount_str column sometimes contains symbols or different delimiters (e.g., '$1,234.56', '1.234,56 EUR'). Produce a result listing rows where amount_str cannot be reliably parsed into numeric amount and matching currency_code. Use ANSI SQL and explain assumptions.
EasyTechnical
31 practiced
As a Financial Analyst, list and explain the most common data quality issues you encounter in financial datasets (for example: ledgers, transaction feeds, budgets, and forecasts). For each issue describe: 1) why it occurs, 2) its typical impact on financial reporting or forecasting, and 3) a short concrete example from a finance context (e.g., duplicate invoice, inconsistent currency formats, missing GL codes).
HardTechnical
33 practiced
Case study: Your forecasting model has started to systematically overestimate revenue by ~3% after a recent ETL rewrite. Outline an investigative and remediation plan that includes validating input data, model assumptions, ETL transformation differences, test coverage gaps, and stakeholder communications for an upcoming board review.
EasyTechnical
43 practiced
Describe how you would validate referential integrity between a transaction table (transactions) and a chart-of-accounts table (accounts). Include the SQL checks you would run, the expected outputs, and what business actions you would recommend for transactions that reference missing accounts.

Unlock Full Question Bank

Get access to hundreds of Data Quality and Validation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.