InterviewStack.io LogoInterviewStack.io

Statistical Foundations for Experimentation Questions

Core statistical concepts and inference needed to design analyze and interpret experiments. Topics include hypothesis testing p values confidence intervals Type One and Type Two errors the relationship between sample size variability and interval width statistical power minimum detectable effect and effect size versus practical significance. Candidates should be able to choose and explain common statistical tests such as t tests and chi square tests contrast Bayesian and frequentist approaches at a conceptual level and describe variance estimation and variance reduction techniques. The topic covers corrections for multiple comparisons sequential testing and the risks of peeking and p hacking common misconceptions about p values and limitations of inference such as confounding and selection bias. Candidates should also be able to translate statistical findings into clear language for non technical stakeholders and explain uncertainty and limitations.

MediumTechnical
0 practiced
Implement in Python a bootstrap-based 95% confidence interval for the difference in means between treatment and control. Signature: def bootstrap_ci(treatment, control, n_resamples=10000, ci=0.95, random_seed=None): return (lower, upper, point_est). Explain in which situations bootstrapping is preferable to parametric CI and its limitations in online experiments.
EasyTechnical
0 practiced
You are testing a new checkout flow. Which statistical test would you choose to compare (a) average time to purchase (continuous) and (b) conversion rate (binary/counts)? Explain the assumptions of a t-test and a chi-square test (or Fisher's exact test), when to prefer non-parametric alternatives, and how you would check assumptions with real production data.
HardTechnical
0 practiced
Tell me about a time you had to convince engineering or product stakeholders to delay a model or feature rollout because the experiment was underpowered or results were ambiguous. If you do not have a direct example, describe the hypothetical conversation: what evidence you would present (power calculation, confidence intervals, potential false positive/negative risks), how you'd quantify the business risk, and what concrete next steps you'd propose (increase sample, pilot, redefine metric).
MediumTechnical
0 practiced
Implement a Python function that computes required sample size per group for a two-sided two-sample proportion test using the normal approximation. Signature: def required_sample_size(p0, delta, alpha=0.05, power=0.8): where p0 is baseline conversion rate and delta is absolute lift to detect. Use z-scores for alpha and power. Explain assumptions and show result for p0=0.10, delta=0.015.
HardTechnical
0 practiced
Describe how to perform a Bayesian A/B test for conversion rates using Beta(a,b) priors and Bernoulli likelihoods. Show the posterior update equations for each arm (Beta(a + successes, b + failures)), explain how to compute a 95% credible interval for the difference (via sampling or analytic approximations), and show how to compute the posterior probability that treatment conversion > control. Contrast this posterior probability with the frequentist p-value interpretation.

Unlock Full Question Bank

Get access to hundreds of Statistical Foundations for Experimentation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.