Neural Network Architectures: Recurrent & Sequence Models Questions
Comprehensive understanding of RNNs, LSTMs, GRUs, and Transformer architectures for sequential data. Understand the motivation for each (vanishing gradient problem, LSTM gates), attention mechanisms, self-attention, and multi-head attention. Know applications in NLP, time series, and other domains. Discuss Transformers in detail—they've revolutionized NLP and are crucial for generative AI.
MediumTechnical
18 practiced
Implement a single-layer LSTM cell forward pass in PyTorch without using torch.nn.LSTM. Inputs: x_t (batch_size, input_size), h_prev (batch_size, hidden_size), c_prev (batch_size, hidden_size), and weight tensors W_ih, W_hh, b_ih, b_hh. Return h_t and c_t. Focus on correct gate computations and shape handling; you may assume weights are provided as torch tensors.
MediumTechnical
25 practiced
You have variable-length sequences batched together with padding. Explain practical strategies to handle padding and masking during training and inference for both RNNs and Transformer models. Cover pack/pad utilities, attention masks, loss masking, avoiding wasted compute, and batching heuristics.
HardTechnical
23 practiced
Explain ZeRO optimizer stages (ZeRO-1, ZeRO-2, ZeRO-3) and how optimizer state/gradient/parameter sharding reduces memory bottlenecks for large transformer training. From an engineering viewpoint, when should you adopt ZeRO and what integration considerations (checkpointing, communication, failure recovery) should you plan for?
MediumTechnical
23 practiced
You must build a production time-series forecasting system. Compare RNN/LSTM/GRU based models to Transformer-based architectures for forecasting tasks. Discuss handling seasonality and trend, long-range dependencies, compute and latency implications, interpretability, and deployment trade-offs for each approach.
EasyTechnical
25 practiced
Explain Backpropagation Through Time (BPTT) for training recurrent models. Describe how truncated BPTT works, when to use it, and how it affects model ability to learn very long-range dependencies versus training efficiency and memory footprint.
Unlock Full Question Bank
Get access to hundreds of Neural Network Architectures: Recurrent & Sequence Models interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.