InterviewStack.io LogoInterviewStack.io

Model Performance Analysis and Root Cause Analysis Questions

Techniques for diagnosing and troubleshooting production ML models, including monitoring metrics such as accuracy, precision, recall, ROC-AUC, latency and throughput; detecting data drift, feature drift, data quality issues, and model drift. Covers root-cause analysis across data, features, model behavior, and infrastructure, instrumentation and profiling, error analysis, ablation studies, and reproducibility. Includes remediation strategies to improve model reliability, performance, and governance in production systems.

MediumSystem Design
0 practiced
Design a monitoring dashboard for a classification model serving ~100k predictions/day. Specify which metrics to display (model/data/infra), what statistical tests or thresholds to use for alerts, sampling strategy for detailed logs, smoothing/windowing choices, and escalation steps when alerts fire.
MediumTechnical
0 practiced
Explain how you would use SHAP values to perform root-cause analysis when a model's error rate increases sharply for a particular user segment. Describe the steps to compute, aggregate across the segment, compare to baseline, and caveats when interpreting SHAP changes.
MediumSystem Design
0 practiced
Design a retraining pipeline that automatically retrains a model based on either a scheduled cadence or observed performance degradation. Include dataset snapshotting, model and data versioning, validation gates (offline and shadow), canary rollout strategy, automated rollback criteria, and cost/compute considerations.
HardTechnical
0 practiced
Your p99 latency target is <50ms at 1000 QPS, but current p99 is ~120ms. Propose a prioritized optimization and rollout plan: how you'd profile the system, apply low-risk changes (e.g., CPU/GPU tuning, concurrency), and higher-risk changes (quantization, pruning, distillation, batching). For each technique estimate expected latency benefit and potential risks to accuracy or stability.
HardTechnical
0 practiced
SHAP importance for several top features shifts significantly after a data pipeline change. How would you determine whether this is due to a real population change, a bug in preprocessing (e.g., wrong scaling or encoding), feature leakage introduced, or a model training artifact? Provide a prioritized investigative checklist and automated tests to detect and prevent such regressions.

Unlock Full Question Bank

Get access to hundreds of Model Performance Analysis and Root Cause Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.