InterviewStack.io LogoInterviewStack.io

Capacity Planning and Resource Optimization Questions

Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.

MediumTechnical
30 practiced
Problem-solving: Suppose training time increased by 30% after switching dataset storage from local NVMe to network-attached storage. Describe steps to diagnose the bottleneck (compute, I/O, network), metrics you would collect, and short-term and long-term remediation strategies.
HardTechnical
25 practiced
Given a transformer model with 10B parameters, sequence length 1024, and target batch size 2 per GPU using fp16, estimate: (a) approximate FLOPs per forward pass, (b) memory needed for parameters and optimizer states (Adam), and (c) whether data parallelism, model parallelism, or pipeline parallelism is more appropriate. Show your assumptions and calculations.
HardTechnical
30 practiced
Design an experiment to quantify the effect of dataset sharding strategy and I/O parallelism on distributed training time and resource usage. Describe hypotheses, independent/dependent variables, controlled setup, metrics to collect, and how to interpret results to inform capacity decisions.
EasyTechnical
37 practiced
Explain the difference between vertical scaling and horizontal scaling for AI workloads (both training and inference). Give concrete examples when each approach is preferable, and list the trade-offs in terms of cost, latency, fault-tolerance, and operational complexity.
MediumTechnical
40 practiced
Scenario: You need to place multiple versions of a large model (e.g., for A/B testing) while minimizing extra memory cost on GPU nodes. Propose a placement and memory-sharing strategy and explain how to avoid interference and ensure fair comparison metrics between variants.

Unlock Full Question Bank

Get access to hundreds of Capacity Planning and Resource Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.