InterviewStack.io LogoInterviewStack.io

Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumTechnical
26 practiced
Implement logistic regression with L2 regularization using batch gradient descent in Python. Function signature: `def logistic_regression_train(X_train, y_train, X_val=None, y_val=None, lr=0.01, reg=1.0, max_iter=1000, tol=1e-6):` Return weights and training/validation loss history. Use a numerically stable log-loss and implement early stopping based on validation loss.
MediumTechnical
43 practiced
Implement k-means++ initialization in Python. Signature: `def kmeans_pp_init(X, k, random_state=None):` Return an array of k initial centers. The implementation should sample with distance-squared weighting and run in expected O(n * k) time for n points, and handle degenerate cases like duplicate points.
HardSystem Design
28 practiced
Design a real-time model-serving architecture for multiple model types (XGBoost, logistic regression, small neural nets) with 1k QPS and a 50ms p95 latency requirement. Include feature store integration, online feature transforms, model versioning and A/B testing strategy, caching/batching techniques, autoscaling, and how to ensure offline/online feature parity.
EasyTechnical
21 practiced
Compare different cross-validation strategies: k-fold, stratified k-fold, leave-one-out, and time-series (rolling) cross-validation. Explain use cases, computational cost, risks such as data leakage, and how to select an appropriate CV strategy for time-dependent data.
HardTechnical
29 practiced
Implement a scalable split-candidate routine for categorical features with extremely high cardinality. API: `def categorical_split_candidates(cat_values, y, max_groups=256):` The function should aggregate categories by frequency or target-encoded groups with smoothing, return up to max_groups candidate buckets, and discuss leakage risks and complexity.

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.