Probability and Statistical Inference Questions

Covers fundamental probability theory and statistical inference from first principles to practical applications. Core probability concepts include sample spaces and events, independence, conditional probability, Bayes theorem, expected value, variance, and standard deviation. Reviews common probability distributions such as normal, binomial, Poisson, uniform, and exponential, their parameters, typical use cases, computation of probabilities, and approximation methods. Explains sampling distributions and the Central Limit Theorem and their implications for estimation and confidence intervals. Presents descriptive statistics and data summary measures including mean, median, variance, and standard deviation. Details the hypothesis testing workflow including null and alternative hypotheses, p values, statistical significance, type one and type two errors, power, effect size, and interpretation of results. Reviews commonly used tests and methods and guidance for selection and assumptions checking, including z tests, t tests, chi square tests, analysis of variance, and basic nonparametric alternatives. Emphasizes practical issues such as correlation versus causation, impact of sample size and data quality, assumptions validation, reasoning about rare events and tail risks, and communicating uncertainty. At more advanced levels expect experimental design and interpretation at scale including A B tests, sample size and power calculations, multiple testing and false discovery rate adjustment, and design choices for robust inference in real world systems.

MediumTechnical

62 practiced

You plan a two-sided A/B test comparing conversion proportions. Baseline p0 = 0.05 and you expect a 20% relative uplift (p1 = 0.06). Using alpha=0.05 and desired power 0.8, compute the required sample size per group. Show the formula you use, numeric steps, and discuss how the calculation changes for unequal allocation or continuous metrics.

HardTechnical

67 practiced

Explain extreme value theory (EVT) basics for modeling tail risk: describe the block maxima approach and the peaks-over-threshold (POT) method, introduce the Generalized Extreme Value (GEV) and Generalized Pareto Distribution (GPD), and discuss main practical challenges (threshold selection, small sample tail estimation, diagnostics).

HardTechnical

86 practiced

Propose a practical strategy that combines hierarchical modeling and empirical Bayes to handle multiple comparisons across many correlated metrics and experiments, with the goal of detecting important effects while controlling false discoveries. Outline the model structure, estimation approach, how to compute calibrated posterior probabilities or local FDR, and how to present results to stakeholders.

MediumTechnical

64 practiced

Show how Maximum A Posteriori (MAP) estimation with a Gaussian prior on linear regression weights leads to L2 (ridge) regularization. Derive the MAP estimator and compare it with the OLS/MLE solution. Discuss how the regularization parameter relates to the prior variance and implications for bias-variance tradeoff.

HardTechnical

67 practiced

Implement a permutation test (in Python or detailed pseudocode) to assess whether the observed average clustering coefficient differs between two groups of networks. Networks vary in size and degree distribution. Explain how you would construct the null permutations to preserve exchangeability, and describe options to preserve within-network structural properties (e.g., degree-preserving swaps) when appropriate.

Unlock Full Question Bank

Get access to hundreds of Probability and Statistical Inference interview questions and detailed answers.

Join thousands of developers preparing for their dream job.