Covers foundational and advanced concepts in deep learning and neural network training. Includes neural network architectures such as feedforward networks, convolutional networks, and recurrent networks, activation functions like rectified linear unit, sigmoid, and hyperbolic tangent, and common loss objectives. Emphasizes the mechanics of forward propagation and backward propagation for computing gradients, and a detailed understanding of optimization algorithms including stochastic gradient descent, momentum methods, adaptive methods such as Adam and RMSprop, and historical methods such as AdaGrad. Addresses practical training challenges and solutions including vanishing and exploding gradients, careful weight initialization, batch normalization, skip connections and residual architectures, learning rate schedules, regularization techniques, and hyperparameter tuning strategies. For senior roles, includes considerations for large scale and distributed training, convergence properties, computational efficiency, mixed precision training, memory constraints, and optimization strategies for models with very large parameter counts.
EasyTechnical
65 practiced
Implement a single-step gradient computation and parameter update for logistic regression in Python/NumPy. Given x (N,D), y (N,) labels in {0,1}, weights w (D,), bias b, and learning rate lr, compute predictions, binary cross-entropy loss, gradients dw and db, and return updated parameters. Note numerical stability requirements.
MediumTechnical
87 practiced
Propose a design for detecting and handling NaN/Inf occurrences during training at scale. Include instrumentation to catch where NaNs are introduced (forward vs backward), automated mitigation actions (skip batch, scale gradients, reduce lr), and logging/metrics to surface to ML platform dashboards.
EasyTechnical
73 practiced
Explain Xavier (Glorot) and He (Kaiming) weight initialization. Provide the formulas for the variance used for initializing weights for a layer with n_in inputs and n_out outputs for both uniform and normal variants. Explain why initialization matters for training deep nets and which activation types pair best with each scheme.
EasyTechnical
65 practiced
Compare ReLU, sigmoid, and tanh activation functions for hidden layers. For each activation, describe (1) mathematical form and derivative, (2) typical ranges of outputs, (3) advantages and disadvantages in deep networks (e.g., saturating behavior, sparsity), and (4) practical guidance for when to choose each activation in modern architectures.
MediumTechnical
130 practiced
Implement from scratch (NumPy) the backward pass for a two-layer fully-connected network (input->hidden->output) using ReLU activation in the hidden layer and softmax cross-entropy at the output. Inputs: X (N,D), y (N,) integer labels, W1, b1, W2, b2. Return loss and gradients dW1, db1, dW2, db2. Emphasize vectorized operations and clarity rather than micro-optimizations.
Unlock Full Question Bank
Get access to hundreds of Neural Networks and Optimization interview questions and detailed answers.