InterviewStack.io LogoInterviewStack.io

Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumTechnical
26 practiced
Implement PCA from scratch in Python using SVD. Your implementation should accept a data matrix X (n x d), optional n_components, and return transformed data and explained variance ratios. Explain why SVD on centered X is numerically preferred over eigendecomposition of the covariance for some shapes of X.
HardSystem Design
24 practiced
Design an end-to-end machine learning system for real-time fraud prediction that must serve predictions at <50ms latency at 10,000 requests/second for 1M active users. Cover offline training pipeline, feature store (online vs offline), feature freshness, model serving (batch vs streaming vs approximate), A/B testing, monitoring, and considerations to avoid training-serving skew.
HardTechnical
24 practiced
Show how L2 regularization modifies the logistic regression objective and derive the Newton–Raphson (Newton) update for L2-regularized logistic regression. Explicitly write the gradient and Hessian including the regularization term and discuss how regularization affects Hessian conditioning and convergence.
MediumTechnical
21 practiced
Explain conceptually how gradient boosting fits an additive model using gradient and (optionally) second-order derivative information. Describe how XGBoost leverages first and second derivatives of the loss during tree fitting and why second-order terms improve leaf-weight estimates and convergence.
EasyTechnical
30 practiced
Define supervised, unsupervised, and semi-supervised learning with practical business examples for each. For every example, state the types of models you might use and the primary objective (prediction, grouping, representation).

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.