InterviewStack.io LogoInterviewStack.io

A and B Test Design Questions

Designing and running A and B tests and split tests to evaluate product and feature changes. Candidates should be able to form clear null and alternative hypotheses, select appropriate primary metrics and guardrail metrics that reflect both product goals and user safety, choose randomization and assignment strategies, and calculate sample size and test duration using power analysis and minimum detectable effect reasoning. They should understand applied statistical analysis concepts including p values confidence intervals one tailed and two tailed tests sequential monitoring and stopping rules and corrections for multiple comparisons. Practical abilities include diagnosing inconclusive or noisy experiments detecting and mitigating common biases such as peeking selection bias novelty effects seasonality instrumentation errors and network interference and deciding when experiments are appropriate versus alternative evaluation methods. Senior candidates should reason about trade offs between speed and statistical rigor plan safe rollouts and ramping define rollback plans and communicate uncertainty and business implications to technical and non technical stakeholders. For developer facing products candidates should also consider constraints such as small populations cross team effects ethical concerns and special instrumentation needs.

HardTechnical
55 practiced
A long-running experiment experienced a traffic composition shift mid-test due to an external marketing event. Describe statistical adjustments (e.g., post-stratification, time-varying covariates, interaction models) and robustness checks to attempt to salvage valid inference.
HardTechnical
60 practiced
When standard analytic assumptions (normality, constant variance) fail—e.g., your metric is heavy-tailed or zero-inflated—explain how you'd use simulation (bootstrap or Monte Carlo) to estimate power and MDE. Outline the simulation steps and how you'd validate simulation realism.
EasyTechnical
60 practiced
You're evaluating a change to a conversational AI's response-generation pipeline. Propose a single primary metric that aligns with product goals, and choose at least three guardrail metrics that protect user safety and experience. For each metric, explain why it's primary or guardrail and any measurement caveats.
MediumTechnical
57 practiced
Describe the problem of optional stopping (peeking) in A/B tests. Explain classical methods like alpha spending (O'Brien-Fleming, Pocock) and Bayesian alternatives that allow sequential monitoring. Give guidance on when to use each in a product environment.
HardTechnical
44 practiced
Provide high-level pseudocode (Python-like) for a contextual Thompson Sampling algorithm for binary rewards with user context features. Explain how to handle high-dimensional contexts and discuss scalability concerns (memory, latency, feature hashing).

Unlock Full Question Bank

Get access to hundreds of A and B Test Design interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.