Broad coverage of modern and advanced neural network architectures, design principles, and components. Candidates should understand core structural elements such as neurons, layers, weights, biases, activation functions, forward and backward passes, and how architecture choices influence learning. Know a range of architecture families including feedforward networks, convolutional neural networks, recurrent neural networks including long short term memory and gated recurrent unit variants, transformer architectures with self attention and multi head attention, vision transformer adaptations, and graph neural networks. Understand inductive biases that make certain architectures appropriate for particular data modalities, trade offs between depth and width, parameter efficiency and computational complexity, and practical considerations such as initialization, normalization, optimization, and scaling strategies. Be able to explain when to choose one architecture over another for a given problem, how to combine or adapt architectures for domain specific needs, and how modern architecture advances address limitations of prior models.
EasyTechnical
70 practiced
Describe Xavier/Glorot and He/Kaiming initializations. Explain how they are derived (variance preservation intuition), which activation families they pair with, and practical defaults you would use for MLPs, ReLU-based CNNs, and RNNs. What happens if initialization is poorly chosen?
MediumTechnical
117 practiced
Implement the forward pass for a single LSTM cell timestep in Python/NumPy. Function signature: lstm_cell_forward(x_t, h_prev, c_prev, params) where params contains W_f, U_f, b_f, W_i, U_i, b_i, W_o, U_o, b_o, W_c, U_c, b_c. Return (h_next, c_next). Include brief docstrings and expected shapes.
HardTechnical
90 practiced
Provide a rigorous explanation of vanishing and exploding gradients in deep feedforward and recurrent networks. Use the repeated Jacobian multiplication intuition, discuss eigenvalues and singular values, and list architectural and algorithmic mitigations (LSTM/GRU gating, residual connections, normalization, orthogonal initialization, gradient clipping) with why they help.
MediumTechnical
77 practiced
You have only 1,000 labeled medical images to build a cancer detection model. Describe a step-by-step transfer learning approach using a pre-trained CNN (e.g., ResNet): data preprocessing, augmentation, layer freezing/unfreezing schedule, optimizer and learning rates, regularization, cross-validation, and evaluation procedures to avoid overfitting.
MediumTechnical
122 practiced
Explain scaled dot-product attention and multi-head attention algorithmically and mathematically. Provide the equations for Attention(Q,K,V), explain the sqrt(d_k) scaling factor, compute time and memory complexity in terms of sequence length L and embedding dim d, and discuss why multiple heads can be beneficial.
Unlock Full Question Bank
Get access to hundreds of Neural Network Architectures interview questions and detailed answers.