InterviewStack.io LogoInterviewStack.io

Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumTechnical
0 practiced
Implement PCA using SVD in Python. Signature: `def pca_svd(X, k):` Return `(components, explained_variance_ratio)`. The function must center the data, choose appropriate SVD variant (dense SVD or randomized SVD) depending on shapes, and document computational complexity and memory trade-offs.
MediumTechnical
0 practiced
Implement logistic regression with L2 regularization using batch gradient descent in Python. Function signature: `def logistic_regression_train(X_train, y_train, X_val=None, y_val=None, lr=0.01, reg=1.0, max_iter=1000, tol=1e-6):` Return weights and training/validation loss history. Use a numerically stable log-loss and implement early stopping based on validation loss.
HardTechnical
0 practiced
Provide a formal analysis of gradient boosting as stage-wise additive modeling. Show that at each iteration the algorithm fits the negative gradient (pseudo-residuals) of the loss with respect to current predictions, derive update equations for a generic differentiable loss, and discuss how learning rate, shrinkage, and tree complexity affect convergence and generalization.
EasyTechnical
0 practiced
Explain bagging vs boosting conceptually. For each method describe how base learners are trained, how predictions are combined, their typical effect on bias and variance, and give production examples where bagging (e.g., Random Forest) or boosting (e.g., XGBoost) is preferable.
MediumTechnical
0 practiced
You are building a fraud detection model with 0.1% fraud rate and high cost for false negatives. Describe your end-to-end approach including data collection and labeling, feature engineering, candidate models, techniques to handle class imbalance (resampling, class weighting, anomaly detection), evaluation metrics and thresholding strategy, and production deployment/monitoring considerations.

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.