Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumTechnical

0 practiced

Implement a linear SVM trainer using SGD in Python with hinge loss and optional L2 regularization. Provide a class `LinearSVM` with `fit(X, y, lr=0.01, epochs=10, batch_size=64, C=1.0)` and `predict(X)` methods. Use y in {-1, +1}. Include shuffling and mini-batch updates for convergence.

MediumTechnical

0 practiced

Compare Support Vector Machines (SVM) and logistic regression for binary classification. Cover the differences in objective (hinge vs log-loss), margin interpretation, the role of regularization, use and cost of kernels, typical complexity in training and memory, and practical advice on when to use linear SVMs, kernel SVMs, or logistic regression in production.

HardSystem Design

0 practiced

Design a real-time model-serving architecture for multiple model types (XGBoost, logistic regression, small neural nets) with 1k QPS and a 50ms p95 latency requirement. Include feature store integration, online feature transforms, model versioning and A/B testing strategy, caching/batching techniques, autoscaling, and how to ensure offline/online feature parity.

HardTechnical

0 practiced

Discuss probabilistic model calibration: define calibration and reliability diagrams, explain Brier score, and describe temperature scaling for calibrating a neural network. Provide the optimization objective for temperature scaling and discuss how to extend calibration methods to multiclass classifiers.

MediumTechnical

0 practiced

Implement k-means++ initialization in Python. Signature: `def kmeans_pp_init(X, k, random_state=None):` Return an array of k initial centers. The implementation should sample with distance-squared weighting and run in expected O(n * k) time for n points, and handle degenerate cases like duplicate points.

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Join thousands of developers preparing for their dream job.