InterviewStack.io LogoInterviewStack.io

A and B Test Design Questions

Designing and running A and B tests and split tests to evaluate product and feature changes. Candidates should be able to form clear null and alternative hypotheses, select appropriate primary metrics and guardrail metrics that reflect both product goals and user safety, choose randomization and assignment strategies, and calculate sample size and test duration using power analysis and minimum detectable effect reasoning. They should understand applied statistical analysis concepts including p values confidence intervals one tailed and two tailed tests sequential monitoring and stopping rules and corrections for multiple comparisons. Practical abilities include diagnosing inconclusive or noisy experiments detecting and mitigating common biases such as peeking selection bias novelty effects seasonality instrumentation errors and network interference and deciding when experiments are appropriate versus alternative evaluation methods. Senior candidates should reason about trade offs between speed and statistical rigor plan safe rollouts and ramping define rollback plans and communicate uncertainty and business implications to technical and non technical stakeholders. For developer facing products candidates should also consider constraints such as small populations cross team effects ethical concerns and special instrumentation needs.

HardTechnical
44 practiced
Developer-facing product: a change affects SDK behavior used by internal teams (small user base, cross-team impacts). Propose instrumentation, experiment design, and communication plan to run safe, informative tests while minimizing disruption to dependent teams.
HardTechnical
51 practiced
Explain CUPED (control variates) as a variance reduction technique in A/B tests. Describe required pre-experiment data, assumptions, and outline how to implement CUPED adjustment in an analysis pipeline for conversion rate.
MediumTechnical
51 practiced
Describe how to adjust sample size calculations for a cluster-randomized experiment (e.g., randomizing by household or geographic region) using the intra-class correlation (ICC). Given 1000 clusters, average cluster size 10, ICC 0.02, and desired effective sample size of 2000 users, estimate whether you have sufficient power.
HardSystem Design
44 practiced
Design an experimentation platform (high-level architecture) that supports consistent bucketing, experiment configuration, metric ingestion, offline and online analysis, and safe rollouts for a company with 100M monthly users. List components, data flow, and scaling considerations.
MediumTechnical
42 practiced
Compare Bayesian A/B testing to frequentist hypothesis testing for production experimentation. Discuss decision thresholds, interpretability for stakeholders, sample size flexibility, and how each handles optional stopping.

Unlock Full Question Bank

Get access to hundreds of A and B Test Design interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.