Multi Armed Bandits and Experimentation Questions

Covers adaptive experimentation methods that trade off exploration and exploitation to optimize sequential decision making, and how they compare to traditional A B testing. Core concepts include the exploration versus exploitation dilemma, regret minimization, reward modeling, and handling delayed or noisy feedback. Familiar algorithms and families to understand are epsilon greedy, Upper Confidence Bound, Thompson sampling, and contextual bandit extensions that incorporate features or user context. Practical considerations include when to choose bandit approaches versus fixed randomized experiments, designing reward signals and metrics, dealing with non stationary environments and concept drift, safety and business constraints on exploration, offline evaluation and simulation, hyperparameter selection and tuning, deployment patterns for online learning, and reporting and interpretability of adaptive experiments. Applications include personalization, recommendation systems, online testing, dynamic pricing, and resource allocation.

EasyTechnical

0 practiced

Given the following simple event table schema: events(user_id, assigned_variant, reward_int, event_ts), write a SQL query to compute daily click-through-rate (CTR) per variant and the cumulative unique users exposed per variant over a 30-day window. Describe assumptions you make about nulls and duplicates.

EasyTechnical

0 practiced

Describe the epsilon-greedy algorithm at a level suitable for an analytics peer and provide short Python-style pseudocode illustrating selection and update steps for Bernoulli rewards. Mention a reasonable default epsilon for initial exploration and how you might decay it.

HardTechnical

0 practiced

Business demands that exploration must maintain at least 95% of baseline revenue. As a data analyst, propose a constrained bandit formulation to satisfy this, including objective, constraint, and a practical online algorithm or heuristic to enforce the revenue floor during rollout.

EasyTechnical

0 practiced

You need to demonstrate a simple epsilon-greedy simulation to a product analyst using Excel. Describe step-by-step how to structure the worksheet, what columns you would include, and how to compute cumulative reward and arm selection counts over simulated rounds.

HardTechnical

0 practiced

You have a logged deterministic policy dataset. As a data analyst, outline an approach to do counterfactual policy optimization (off-policy learning) to learn a new policy from this data. Discuss importance sampling limitations, propensity estimation, and when using model-based approaches is necessary.

Unlock Full Question Bank

Get access to hundreds of Multi Armed Bandits and Experimentation interview questions and detailed answers.

Join thousands of developers preparing for their dream job.