InterviewStack.io LogoInterviewStack.io

Online Experimentation and Model Validation Questions

Running experiments in production to validate model changes and measure business impact. Topics include splitting traffic across model variants canary deployments and champion challenger testing selecting metrics that capture both model performance and business outcomes performing sample size and test duration calculations accounting for statistical power and multiple testing adjustments and handling instrumentation and novelty bias. Candidates should be able to analyze heterogeneous treatment effects monitor experiments in real time and design ramping plans and rollback guardrails to protect user experience and business metrics. The topic also covers decision rules for when to rely on offline evaluation versus online experiments and how to interpret differences between offline model metrics and live user outcomes as part of model validation and deployment strategy.

HardTechnical
0 practiced
Explain sequential testing with alpha spending and implement the core idea: given total alpha and K interim looks, describe how to allocate alpha across looks using O'Brien-Fleming and Pocock spending functions. Explain how these affect early stopping sensitivity and overall Type I error.
HardTechnical
0 practiced
You must optimize for multiple metrics (e.g., revenue and engagement) simultaneously when selecting model variants. Propose a practical multi-objective evaluation and selection strategy, e.g., constrained optimization, Pareto frontier exploration, or scalarization. Explain how to present trade-offs to stakeholders and choose a single winner for rollout.
HardTechnical
0 practiced
You are the technical lead and the CEO requests immediate global rollout of a model because an experiment shows a small significant lift on a proxy metric, but your team sees risks to customer experience. How do you handle this conflict? Describe your approach to stakeholder communication, escalation, data you would present, and final decision criteria.
HardSystem Design
0 practiced
Design a real-time anomaly detection system that monitors experiments and raises alerts when statistically significant regressions occur. Describe detection algorithms (CUSUM, EWMA, changepoint detection), how to set thresholds to balance false positives/negatives, and integration with escalation playbooks to pause or rollback traffic.
MediumTechnical
0 practiced
You observe heterogeneous treatment effects across age groups in an A/B test. Explain how you'd detect and estimate subgroup effects, maintain statistical validity (avoid false discovery), and present results. Discuss pre-specification, interaction models, causal trees/CT-H, and uplift modeling trade-offs.

Unlock Full Question Bank

Get access to hundreds of Online Experimentation and Model Validation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.