InterviewStack.io LogoInterviewStack.io

Multi Armed Bandits and Experimentation Questions

Covers adaptive experimentation methods that trade off exploration and exploitation to optimize sequential decision making, and how they compare to traditional A B testing. Core concepts include the exploration versus exploitation dilemma, regret minimization, reward modeling, and handling delayed or noisy feedback. Familiar algorithms and families to understand are epsilon greedy, Upper Confidence Bound, Thompson sampling, and contextual bandit extensions that incorporate features or user context. Practical considerations include when to choose bandit approaches versus fixed randomized experiments, designing reward signals and metrics, dealing with non stationary environments and concept drift, safety and business constraints on exploration, offline evaluation and simulation, hyperparameter selection and tuning, deployment patterns for online learning, and reporting and interpretability of adaptive experiments. Applications include personalization, recommendation systems, online testing, dynamic pricing, and resource allocation.

MediumTechnical
0 practiced
Compare and contrast Thompson sampling and UCB for a typical online experimentation problem. Discuss practical trade-offs in convergence speed, ease of implementation, hyperparameter sensitivity, and interpretability for business stakeholders.
MediumTechnical
0 practiced
Outline a reporting dashboard in Tableau or Power BI for stakeholders to monitor bandit experiments. List 6 widgets/visuals (e.g., per-arm cumulative reward, regret, propensity histogram), and provide an example SQL query to compute per-arm cumulative reward and confidence interval for display.
HardSystem Design
0 practiced
Design a production architecture for serving a contextual bandit across multiple regions with low-latency action selection, periodic model retraining, and strong auditability. Include components for feature store, online model server, offline training, logging, and a rollback mechanism for failed models.
EasyTechnical
0 practiced
Describe the epsilon-greedy algorithm at a level suitable for an analytics peer and provide short Python-style pseudocode illustrating selection and update steps for Bernoulli rewards. Mention a reasonable default epsilon for initial exploration and how you might decay it.
MediumTechnical
0 practiced
Implement the UCB1 selection step in Python: given arrays counts (n_i) and sum_rewards (s_i) for k arms and current time t, produce the arm index to pull next using the standard UCB1 formula. Explain numerical stability concerns and tie-breaking behavior.

Unlock Full Question Bank

Get access to hundreds of Multi Armed Bandits and Experimentation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.