InterviewStack.io LogoInterviewStack.io

Computer Vision Fundamentals Questions

Core concepts and methods in computer vision with an emphasis on both traditional image processing and modern deep learning approaches. Candidates should understand how images are represented as matrices or tensors, common preprocessing steps and augmentation techniques to improve generalization, and fundamentals of convolutional neural networks including convolution operations, receptive fields, pooling, and normalization. Familiarity with common vision tasks such as image classification, object detection, semantic and instance segmentation, and key model design patterns is expected. Candidates should know common vision architectures and families such as residual networks and Visual Geometry Group style networks, the role of pretrained models and transfer learning, how to fine tune models for new tasks, and practical tooling including image processing libraries and deep learning frameworks for training and inference. Evaluation may include trade offs between accuracy, latency, and resource usage for deployment.

MediumTechnical
0 practiced
Explain the difference between semantic segmentation and instance segmentation. Describe models commonly used for each (e.g., U-Net, DeepLabv3+ for semantic; Mask R-CNN for instance) and how label formats and loss functions differ between the tasks.
HardTechnical
0 practiced
You need to compress a ResNet101 model to fit a 100MB memory budget for mobile deployment while retaining at least 95% of original accuracy. Propose a step-by-step compression plan including structured pruning, low-rank factorization, per-channel quantization, knowledge distillation, and any architecture changes. Explain how you'd evaluate trade-offs at each step.
EasyTechnical
0 practiced
Compare pooling (max/average) versus strided convolution for spatial downsampling in CNNs. Discuss the effects on translation invariance, learnable parameters, information loss, and when modern architectures prefer one over the other.
MediumTechnical
0 practiced
Explain the benefits and pitfalls of mixed-precision (FP16/FP32) training. Describe how Automatic Mixed Precision (AMP) works in PyTorch, the role of dynamic loss scaling, and how to debug precision-related instabilities such as gradient underflow or overflow.
MediumTechnical
0 practiced
Compare RandAugment and AutoAugment as methods for learning image augmentation policies. Discuss computational cost, ease of tuning, and when automatically searched policies produce gains over manual augmentation.

Unlock Full Question Bank

Get access to hundreds of Computer Vision Fundamentals interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.