InterviewStack.io LogoInterviewStack.io

Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumTechnical
0 practiced
Write a Python function that finds the best split threshold for a continuous feature using Gini impurity for binary classification. Signature: `def best_split_threshold(X_feature, y):` Return `(best_threshold, best_gain)`. The implementation should sort values and compute impurity in O(n log n) time and avoid repeated full scans.
HardTechnical
0 practiced
Case study: security event logs with sparse signals and adaptive attackers who try to evade detection. Propose a robust anomaly detection solution that combines unsupervised methods (autoencoders, isolation forest), semi-supervised learning from limited labeled incidents, behavioral modeling, adversarial-robust feature engineering, and a rigorous evaluation plan including red-team simulations and analyst-in-the-loop feedback.
HardTechnical
0 practiced
Implement a scalable split-candidate routine for categorical features with extremely high cardinality. API: `def categorical_split_candidates(cat_values, y, max_groups=256):` The function should aggregate categories by frequency or target-encoded groups with smoothing, return up to max_groups candidate buckets, and discuss leakage risks and complexity.
HardSystem Design
0 practiced
Design an ML monitoring system that detects data drift, concept drift, sudden performance drops, and label delays for deployed models. Specify which metrics to collect (feature distribution stats, model score distributions, calibration, label latency), alerting thresholds, automated triage rules, root-cause attribution methods (feature-level drift detection), and human escalation processes.
EasyBehavioral
0 practiced
Tell me about a time you diagnosed and fixed a model that was overfitting in production. Use the STAR method: describe the situation, what diagnostics you used (metrics and visualizations), the concrete steps you took (data, features, model changes, hyperparameters), and the measurable outcome after your intervention.

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.