InterviewStack.io LogoInterviewStack.io

Computer Vision Fundamentals Questions

Core concepts and methods in computer vision with an emphasis on both traditional image processing and modern deep learning approaches. Candidates should understand how images are represented as matrices or tensors, common preprocessing steps and augmentation techniques to improve generalization, and fundamentals of convolutional neural networks including convolution operations, receptive fields, pooling, and normalization. Familiarity with common vision tasks such as image classification, object detection, semantic and instance segmentation, and key model design patterns is expected. Candidates should know common vision architectures and families such as residual networks and Visual Geometry Group style networks, the role of pretrained models and transfer learning, how to fine tune models for new tasks, and practical tooling including image processing libraries and deep learning frameworks for training and inference. Evaluation may include trade offs between accuracy, latency, and resource usage for deployment.

MediumTechnical
0 practiced
Describe how Grad-CAM works for visualizing model decisions in CNNs. If asked to implement Grad-CAM for a PyTorch classification model, which layer would you hook into and what are the key steps to produce a heatmap overlay for a predicted class?
MediumTechnical
0 practiced
Design a monitoring strategy to detect data drift and model performance degradation for a production vision model. Specify which input statistics and model outputs you would log, sampling strategies, alert thresholds, and how to trigger retraining or human review.
MediumTechnical
0 practiced
Write a Python function using numpy that performs Non-Maximum Suppression (NMS) on a list of bounding boxes and scores. The function should accept an IoU threshold and return the indices of boxes to keep. Discuss time complexity and how to vectorize for speed.
MediumTechnical
0 practiced
Explain self-supervised pretraining methods (e.g., SimCLR, MoCo, DINO) for computer vision. How do contrastive and clustering-based approaches learn useful representations, and when might self-supervised pretraining outperform supervised ImageNet pretraining for downstream tasks?
MediumTechnical
0 practiced
Design a sliding-window tiling strategy for running segmentation inference on very high-resolution images (for example, 4k satellite imagery). Explain how to handle overlaps, stitch tile predictions to reduce seam artifacts, and choices for tile size and stride given memory constraints.

Unlock Full Question Bank

Get access to hundreds of Computer Vision Fundamentals interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.