InterviewStack.io LogoInterviewStack.io

Data Problem Solving and Business Context Questions

Practical data oriented problem solving that connects business questions to correct, robust analyses. Includes translating business questions into queries and metric definitions, designing SQL or query logic for edge cases, handling data quality issues such as nulls duplicates and inconsistent dates, validating assumptions, and producing metrics like retention and churn. Emphasizes building queries and pipelines that are resilient to real world data issues, thinking through measurement definitions, and linking data findings to business implications and possible next steps.

MediumTechnical
0 practiced
Given deployments(service_id, deploy_id, deploy_time, author) and incidents(incident_id, service_id, start_time, end_time, severity), write SQL to measure correlation between deploy frequency and incident count per service per month. Provide the query and discuss how to interpret correlation vs causation and possible confounders.
HardTechnical
0 practiced
You receive logs with duplicate events and inconsistent client-side timestamps due to clock skew. Describe an algorithm to assign events to correct time buckets for reporting (event-time windows) that compensates for client clock skew. Include methods to estimate per-client skew, how to apply corrections, how to handle siblings with missing skew estimates, and how to ensure deterministic bucketing for reprocessing.
MediumTechnical
0 practiced
A metric in production (request_count) shows 1,000,000 requests yesterday, but the corresponding database table that records user actions has only 900,000 new rows. As SRE, describe a prioritized investigation plan: what SQL checks and production queries would you run, instrumentation/tests you would add, and the likely root causes you would consider (at least five).
HardTechnical
0 practiced
Write SQL to compute error budget consumption for a service with SLO target 99.9% success over 28 days. Use requests(service_id, is_success BOOL, timestamp). The query should produce current 28-day success percentage, error budget remaining (in minutes or percentage), and burn rate compared to baseline (last 28-day rolling window vs previous rolling window). Discuss handling of missing data.
MediumTechnical
0 practiced
Describe strategies to handle schema evolution in event data pipelines (adding/removing fields, type changes) while maintaining correctness of historical metrics. Include technical approaches (Avro/Protobuf with schemas, default values, migration jobs), documentation, and how to communicate breaking changes to consumers.

Unlock Full Question Bank

Get access to hundreds of Data Problem Solving and Business Context interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.