InterviewStack.io LogoInterviewStack.io

A and B Test Design Questions

Designing and running A and B tests and split tests to evaluate product and feature changes. Candidates should be able to form clear null and alternative hypotheses, select appropriate primary metrics and guardrail metrics that reflect both product goals and user safety, choose randomization and assignment strategies, and calculate sample size and test duration using power analysis and minimum detectable effect reasoning. They should understand applied statistical analysis concepts including p values confidence intervals one tailed and two tailed tests sequential monitoring and stopping rules and corrections for multiple comparisons. Practical abilities include diagnosing inconclusive or noisy experiments detecting and mitigating common biases such as peeking selection bias novelty effects seasonality instrumentation errors and network interference and deciding when experiments are appropriate versus alternative evaluation methods. Senior candidates should reason about trade offs between speed and statistical rigor plan safe rollouts and ramping define rollback plans and communicate uncertainty and business implications to technical and non technical stakeholders. For developer facing products candidates should also consider constraints such as small populations cross team effects ethical concerns and special instrumentation needs.

MediumTechnical
47 practiced
You observe a statistically significant uplift in the primary metric but a deterioration in a safety guardrail (more unsafe content flags). Outline the investigation steps you would take to decide between rollout, rollback, or further experimentation. Include analysis, additional metrics, and stakeholder communication.
HardTechnical
47 practiced
A/B test analysis shows heterogeneous effects that are likely driven by a confounder not balanced by randomization (e.g., campaign exposure). Describe how you would use post-stratification, inverse-probability weighting, or instrumental variables to adjust estimates and when each method is appropriate.
MediumTechnical
50 practiced
Explain how covariate adjustment (e.g., ANCOVA or regression) can increase power in experiments. Provide a short example using pre-period engagement as a covariate and describe assumptions and pitfalls (e.g., post-treatment variables, model misspecification).
HardSystem Design
42 practiced
In a two-sided marketplace (buyers and sellers), a ranking algorithm change affects both sides and cross-side externalities occur. Describe how you would design experiments to disentangle buyer-side and seller-side effects, define the estimands, and identify practical strategies to measure system-level impact.
HardTechnical
44 practiced
Provide high-level pseudocode (Python-like) for a contextual Thompson Sampling algorithm for binary rewards with user context features. Explain how to handle high-dimensional contexts and discuss scalability concerns (memory, latency, feature hashing).

Unlock Full Question Bank

Get access to hundreds of A and B Test Design interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.