InterviewStack.io LogoInterviewStack.io

Experiment Design and Execution Questions

Covers end to end design and execution of experiments and A B tests, including identifying high value hypotheses, defining treatment variants and control, ensuring valid randomization, defining primary and guardrail metrics, calculating sample size and statistical power, instrumenting events, running analyses and interpreting results, and deciding on rollout or rollback. Also includes building testing infrastructure, establishing organizational best practices for experimentation, communicating learnings, and discussing both successful and failed tests and their impact on product decisions.

HardTechnical
60 practiced
You analyze experiments with many correlated metrics. Propose statistical strategies to deal with correlated testing, including multivariate testing, hierarchical modeling, and constructing composite business metrics. Explain the trade-offs, how to choose weights for composite metrics considering covariance, and how to preserve interpretability for product stakeholders.
EasyTechnical
44 practiced
Describe the key factors that determine required sample size for an A/B test. Explain how baseline conversion rate, minimum detectable effect (MDE), desired power, significance level, and variance interact. Give a brief numerical example using baseline conversion 10%, relative MDE 10%, alpha 0.05, and power 80% — show how you would approximate sample size per arm (no need for exact code).
MediumSystem Design
46 practiced
You are defining the architecture for a mid-sized company's experimentation platform. Sketch high-level components and data flow that support: feature flags, online bucketing, event collection, metric computation, experiment registry, anomaly detection, and reporting. For each component, discuss storage choices (streaming vs batch), expected SLAs, and how to handle message loss or downstream reprocessing.
HardTechnical
54 practiced
A feature is expected to yield a tiny relative uplift of 0.1% on conversion. Your product receives 2M monthly users and conversions are highly noisy. Propose practical statistical strategies to detect such a small effect: variance-reduction techniques (e.g., CUPED), targeted experiments on high-sensitivity segments, pooling or meta-analysis across multiple tests, and trade-offs for business risks and time to decision.
MediumTechnical
50 practiced
You run experiments tracking 40 metrics per treatment. Explain the multiple comparisons problem and compare Bonferroni correction and False Discovery Rate (FDR). For a product team that needs actionable insights but wants to limit false positives, recommend a practical multiple-testing correction strategy and justify your choice.

Unlock Full Question Bank

Get access to hundreds of Experiment Design and Execution interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.