InterviewStack.io LogoInterviewStack.io

Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumTechnical
0 practiced
Compare L1 (Lasso) and L2 (Ridge) regularization. Explain the geometric intuition (constraint regions), impact on coefficient sparsity and variance, and when you would prefer Elastic Net. Describe how you would choose the regularization strength in a production workflow.
HardTechnical
0 practiced
Case study: False positive manual-review cost is $50, false negative lost revenue is $200. Design a cost-sensitive ML approach to minimize expected operational cost: define expected cost, propose ways to incorporate costs at training time (reweighting or custom loss), choose operating threshold, and describe an A/B testing plan to validate business impact.
HardTechnical
0 practiced
Provide a mathematically grounded explanation of gradient boosting: how boosting can be seen as gradient descent in function space, how the negative gradient corresponds to residuals, and how choice of loss function (squared error vs logistic) affects the updates. Explain common regularization techniques in boosting and their theoretical effect.
MediumTechnical
0 practiced
Implement an early-stopping wrapper in Python for sklearn-like estimators that supports validation monitoring, patience, min_delta, and restoring the best model. The wrapper should accept either estimators with iterative `partial_fit`/`warm_start` or a `fit` that supports `n_iter`-style control. Focus on API design and correct restoration of the best model.
MediumTechnical
0 practiced
Compare XGBoost, LightGBM, and CatBoost for production use on mixed-type datasets with many categorical features and millions of rows. Discuss tree growth strategies (leaf-wise vs level-wise), categorical handling, histogram binning, GPU support, typical hyperparameter choices, and production considerations like model size and inference speed.

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.