Neural Networks and Optimization Questions

Covers foundational and advanced concepts in deep learning and neural network training. Includes neural network architectures such as feedforward networks, convolutional networks, and recurrent networks, activation functions like rectified linear unit, sigmoid, and hyperbolic tangent, and common loss objectives. Emphasizes the mechanics of forward propagation and backward propagation for computing gradients, and a detailed understanding of optimization algorithms including stochastic gradient descent, momentum methods, adaptive methods such as Adam and RMSprop, and historical methods such as AdaGrad. Addresses practical training challenges and solutions including vanishing and exploding gradients, careful weight initialization, batch normalization, skip connections and residual architectures, learning rate schedules, regularization techniques, and hyperparameter tuning strategies. For senior roles, includes considerations for large scale and distributed training, convergence properties, computational efficiency, mixed precision training, memory constraints, and optimization strategies for models with very large parameter counts.

MediumTechnical

66 practiced

Implement numerical gradient checking for a small neural network (one hidden layer) in Python/NumPy. Given a function that returns loss for parameters theta, compute numerical gradients by finite differences and compare to analytic gradients; return maximum absolute difference. Show how to pick epsilon and how to interpret small vs large differences.

MediumTechnical

85 practiced

Provide PyTorch-style pseudocode that implements gradient accumulation to simulate a larger batch size. Show where to call optimizer.step(), when to scale the loss, and how to maintain proper learning-rate behavior.

MediumTechnical

83 practiced

Describe mixed precision training (FP16/FP32). Explain why mixed precision speeds up training, which tensors should be kept in FP32 (master weights, some accumulators), and how dynamic loss scaling prevents underflow. List common pitfalls and debugging steps when NaNs appear.

EasyTechnical

73 practiced

Discuss how mini-batch size affects optimization in neural network training. Cover aspects including convergence speed, generalization, training throughput, memory requirements, and the role of gradient noise. When would you use gradient accumulation?

MediumTechnical

65 practiced

Design a small experiment to compare the effect of batch normalization vs group normalization on model accuracy when per-GPU batch size is very small (e.g., 2). Describe dataset splits, metrics, controlled hyperparameters, and what differences you expect and why.

Unlock Full Question Bank

Get access to hundreds of Neural Networks and Optimization interview questions and detailed answers.

Join thousands of developers preparing for their dream job.