InterviewStack.io LogoInterviewStack.io

Performance and Code Optimization Questions

Covers techniques and decision making for improving application and code performance across levels from algorithm and memory access patterns to frontend bundling and runtime behavior. Candidates should be able to profile and identify bottlenecks, apply low level optimizations such as loop unrolling, function inlining, cache friendly access patterns, reducing branching, and smart memory layouts, and use compiler optimizations effectively. It also includes higher level application and frontend optimizations such as code splitting and lazy loading, tree shaking and dead code elimination, minification and compression, dynamic imports, service worker based caching, prefetching strategies, server side rendering versus client side rendering trade offs, static site generation considerations, and bundler optimization with tools like webpack Vite and Rollup. Emphasize measurement first and avoiding premature optimization, and explain the trade offs between performance gains and added complexity or maintenance burden. At senior levels expect ability to make intentional trade off decisions and justify which optimizations are worth their complexity for a given system and workload.

MediumTechnical
28 practiced
A Python image preprocessing pipeline using PIL is the CPU bottleneck before GPU training. Describe and sketch two approaches to speed it up: (1) replacing Python loops with NumPy vectorized operations, and (2) parallelizing with multiprocessing or thread pools. Explain the trade-offs including memory and GIL concerns.
EasyTechnical
16 practiced
You need to run inference of an ML model inside the browser using WebAssembly. List concrete steps and optimizations to minimize initial download and startup time for users on slow networks, including model size, lazy loading, caching, and compilation strategies.
HardTechnical
22 practiced
Explain trade-offs between RDMA (InfiniBand) and TCP for parameter synchronization in large-scale distributed training. When does RDMA provide clear benefits, and what are the operational and code-level considerations for adopting RDMA in your training stack?
EasyTechnical
23 practiced
Write a Python function that measures average and p95 inference latency for a given PyTorch model on CPU. The function should: set the model to eval mode, perform warm-up runs, run N timed inferences with random inputs of a specified shape, and return mean and p95 latency in milliseconds. Explain how you would adapt the code for GPU timing.
HardTechnical
16 practiced
You're asked to reduce end-to-end training time by roughly 3x for a large model. Propose an optimization plan across data loading, augmentation, mixed-precision, gradient checkpointing, distributed training strategies, and hardware choices. Provide rough expected speedup ranges for each change and justify assumptions.

Unlock Full Question Bank

Get access to hundreds of Performance and Code Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.