Data Problem Solving and Business Context Questions

Practical data oriented problem solving that connects business questions to correct, robust analyses. Includes translating business questions into queries and metric definitions, designing SQL or query logic for edge cases, handling data quality issues such as nulls duplicates and inconsistent dates, validating assumptions, and producing metrics like retention and churn. Emphasizes building queries and pipelines that are resilient to real world data issues, thinking through measurement definitions, and linking data findings to business implications and possible next steps.

EasyTechnical

0 practiced

Explain the business meaning of MAU (Monthly Active Users) versus DAU (Daily Active Users). For an SRE focused on capacity planning and cost forecasting, discuss when DAU/MAU ratio (stickiness) matters and how it should influence capacity estimates and incident prioritization.

HardTechnical

0 practiced

Design a deduplication strategy for a streaming system with at-least-once semantics (events have event_id, event_time, ingestion_time). Your design should guarantee correctness (no double-counting) for downstream aggregates while minimizing state and supporting TTL for dedupe keys. Provide pseudocode for dedupe logic (e.g., Flink or Beam), discuss watermarking and state eviction, and explain recovery after a crash.

MediumTechnical

0 practiced

Given deployments(service_id, deploy_id, deploy_time, author) and incidents(incident_id, service_id, start_time, end_time, severity), write SQL to measure correlation between deploy frequency and incident count per service per month. Provide the query and discuss how to interpret correlation vs causation and possible confounders.

MediumTechnical

0 practiced

Write a PromQL alert expression that fires when a service's 5xx error rate increases by at least 5x compared to the 1-hour baseline and only when the request rate is above 100 requests per minute (to avoid noise on low traffic services). Assume metrics: http_requests_total{service,code}, scrape interval 15s. Explain the reasoning behind each clause.

HardTechnical

0 practiced

Design a cross-team dashboard and a data contract to track SLOs and error budgets across multiple services with self-service access for teams. Include: required fields in the contract, versioning strategy, data ownership and ownership enforcement, validation checks, and UI considerations so teams can understand their budgets and historical burn. Explain how to handle inconsistent SLO definitions between teams.

Unlock Full Question Bank

Get access to hundreds of Data Problem Solving and Business Context interview questions and detailed answers.

Join thousands of developers preparing for their dream job.