InterviewStack.io LogoInterviewStack.io

Capacity Planning and Resource Optimization Questions

Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.

EasyTechnical
28 practiced
Compare spot/preemptible instances and on-demand instances for ML workloads. Which types of training or inference workloads are appropriate for spot instances? Describe failure modes and mitigation strategies.
MediumTechnical
29 practiced
Write a Python function select_instance(instances, required_gpu_mem_gb, required_cpu_cores, max_hourly_cost) that selects the cheapest instance meeting the requirements. 'instances' is a list of dicts like {'name':'p3.2xlarge','gpu_mem_gb':16,'cpu_cores':8,'hourly_cost':3.06}. Return None if none match. Aim for O(n) complexity and handle edge cases.
MediumTechnical
21 practiced
Technical: Describe how you would measure and benchmark the effect of model quantization (fp32 -> int8) on inference throughput and latency on target hardware. Include dataset selection, representative workloads, statistical tests, and how to regress performance vs accuracy trade-offs.
MediumSystem Design
27 practiced
Design a policy for scheduling off-peak batch jobs (data preprocessing, offline training) and archival strategies for old datasets. Include time windows, priority vs preemption, cold storage decisions (e.g., Glacier), and how to coordinate with production inference capacity to avoid interference.
EasyTechnical
37 practiced
Explain the difference between vertical scaling and horizontal scaling for AI workloads (both training and inference). Give concrete examples when each approach is preferable, and list the trade-offs in terms of cost, latency, fault-tolerance, and operational complexity.

Unlock Full Question Bank

Get access to hundreds of Capacity Planning and Resource Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.