InterviewStack.io LogoInterviewStack.io

Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumTechnical
26 practiced
Compare and contrast model-agnostic interpretability methods (LIME, SHAP) and model-specific methods (feature importances from trees, coefficients). For a research project, what experiments would you run to evaluate the faithfulness and stability of explanations produced by these methods?
HardTechnical
30 practiced
Prove that logistic regression maximum-likelihood objective (negative log-likelihood) is convex for binary classification and extend the argument to the multiclass softmax cross-entropy. Explain implications of convexity on optimization guarantees and global optimum, and comment on the effect of non-linear feature maps.
EasyTechnical
22 practiced
Describe Lloyd's k-means algorithm in detail and discuss its sensitivity to initialization. Explain the k-means++ initialization and why it helps. Also mention computational complexity per iteration and practical tips to speed up k-means on large datasets.
MediumTechnical
23 practiced
Derive and discuss the computational complexity of training a random forest with T trees on a dataset of n samples and d features. Include factors such as maximum tree depth, feature subsampling, and splitting criterion computation, and propose engineering optimizations to reduce runtime and memory usage.
HardTechnical
45 practiced
Research problem: boosting algorithms can overfit noisy labels. Propose algorithmic modifications to gradient boosting to improve robustness to label noise, justify your choices theoretically or intuitively, and outline an experimental protocol to validate robustness on synthetic and real noisy-label datasets.

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.