TensorFlow/PyTorch Framework Fundamentals Questions
Practical knowledge of a major deep learning framework. Includes understanding tensors, operations, building neural network layers, constructing models, and training loops. Ability to read and modify existing code in these frameworks. Knowledge of how to work with pre-built layers and models.
HardTechnical
0 practiced
A distributed training job randomly fails with CUDA OOM after 10-50 steps though it had previously run fine. Describe a methodical debugging approach using PyTorch/TensorFlow introspection tools and OS-level utilities (nvidia-smi, ps, tracemalloc) to find memory leaks, including how to reproduce, capture memory snapshots, and common root causes.
MediumTechnical
0 practiced
Write a TensorFlow tf.data pipeline suited for image classification that reads JPEG files from disk, decodes them, resizes to 224x224, applies random horizontal flip and color jitter as augmentation, caches (if safe), shuffles, batches, and prefetches. Explain the meaning and recommended values for num_parallel_calls and prefetch buffer size and when to use cache().
HardTechnical
0 practiced
You're observing training loss decreasing but validation F1 dropping over time for a classification problem. Using TensorFlow or PyTorch tooling, design a prioritized diagnostics and mitigation plan that includes data validation, augmentation, regularization, learning rate schedule adjustments, early stopping, and how to use callbacks/profilers to gather evidence.
HardSystem Design
0 practiced
You need to migrate a stale TensorFlow 1.x model (graph mode, custom ops) to PyTorch to accelerate developer iteration. Outline a migration plan covering: auditing operations, mapping layers, converting weights, writing layer-level unit tests, deciding between reimplementation vs exporting via ONNX, and validating numeric equivalence at multiple points.
MediumTechnical
0 practiced
Implement a simple callback system in a PyTorch training loop to support ModelCheckpoint, EarlyStopping, and ReduceLROnPlateau behaviors similar to Keras. Show how callbacks can hook into events (on_epoch_end, on_batch_end), save state, and be included in checkpoint artifacts.
Unlock Full Question Bank
Get access to hundreds of TensorFlow/PyTorch Framework Fundamentals interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.