Online Experimentation and Model Validation Questions

Running experiments in production to validate model changes and measure business impact. Topics include splitting traffic across model variants canary deployments and champion challenger testing selecting metrics that capture both model performance and business outcomes performing sample size and test duration calculations accounting for statistical power and multiple testing adjustments and handling instrumentation and novelty bias. Candidates should be able to analyze heterogeneous treatment effects monitor experiments in real time and design ramping plans and rollback guardrails to protect user experience and business metrics. The topic also covers decision rules for when to rely on offline evaluation versus online experiments and how to interpret differences between offline model metrics and live user outcomes as part of model validation and deployment strategy.

HardSystem Design

0 practiced

Multiple product teams run overlapping experiments that could interact. Propose a design to run multiple concurrent experiments safely: options include full factorial design, orthogonalization, or reserving holdout groups. Explain pros/cons, sample-size implications, and how to detect interaction effects post-hoc.

EasyTechnical

0 practiced

You're launching a new personalization ranking model for a homepage. Propose a hierarchy of metrics to capture model quality and business impact: label at least one primary metric, two guardrail metrics, and three diagnostic metrics. For each metric, explain the rationale and how you'd compute it from exposure/event logs (including attribution window and deduplication rules).

MediumTechnical

0 practiced

Offline evaluation shows a candidate model increases AUC by 6% relative to baseline, yet a live A/B test shows no change in conversion and a drop in click-through rate. Walk through a systematic investigation plan that covers data, instrumentation, segment-level effects, exposure differences, model inference differences, and upstream/downstream product interactions.

MediumSystem Design

0 practiced

Describe how to implement a champion-challenger testing framework for production models so challengers can be evaluated live without full rollout. Include traffic splitting strategy (shadow vs live), synchronous vs asynchronous evaluation, latency and cold-start considerations, and how to decide when a challenger replaces the champion.

EasyTechnical

0 practiced

Provide a short SQL query (assume PostgreSQL) to compute per-user conversion rate given a table events(user_id, event_type, event_time) where event_type can be 'view' or 'purchase'. Show how you'd compute conversion rate as purchases divided by unique users with at least one view in a 7-day window ending on a specified date.

Unlock Full Question Bank

Get access to hundreds of Online Experimentation and Model Validation interview questions and detailed answers.

Join thousands of developers preparing for their dream job.