InterviewStack.io LogoInterviewStack.io

Experimentation and Product Validation Questions

Designing and interpreting experiments and validation strategies to test product hypotheses. Includes hypothesis formulation, experimental design, sample sizing considerations, metrics selection, interpreting results and statistical uncertainty, and avoiding common pitfalls such as peeking and multiple hypothesis testing. Also covers qualitative validation methods such as interviews and pilots, and using a mix of methods to validate product ideas before scaling.

EasyTechnical
70 practiced
Implement a Python function that performs a two-sample proportion z-test. The function should accept counts and sample sizes for control and treatment, return the z-statistic, two-sided p-value, difference in proportions, and a 95% confidence interval for the difference. Assume large-sample normal approximation is valid.
MediumTechnical
69 practiced
A test shows an increase in checkout conversion rate but also a drop in the number of users reaching the checkout step. Explain how Simpson's paradox or aggregation bias could cause this, and describe an analysis plan to disentangle the effect including conditional breakdown and causal reasoning.
HardTechnical
67 practiced
Implement an alpha-spending schedule calculator in Python that outputs critical p-value thresholds for interim looks under Pocock and O'Brien-Fleming boundaries given a planned number of looks. The function should return thresholds per look and explain how each boundary behaves (early vs late conservatism).
MediumTechnical
65 practiced
You have 10 experimental variants tested against control. Explain how to control family-wise error rate (FWER) and false discovery rate (FDR). Compare Bonferroni, Holm-Bonferroni, and Benjamini-Hochberg procedures and discuss which is appropriate when the business prefers to prioritize discovery vs. strict control of false positives.
HardSystem Design
54 practiced
Design an architecture for a global experimentation system that must support up to 1 billion events per day, multi-tenant experiments, low-latency feature flag evaluation at the edge, and reproducible offline analysis. Discuss consistent hashing/randomization, event ingestion, streaming vs batch metric computation, storage choices, and strategies to ensure deterministic bucketing across SDK versions.

Unlock Full Question Bank

Get access to hundreds of Experimentation and Product Validation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.