InterviewStack.io LogoInterviewStack.io

Capacity Planning and Resource Optimization Questions

Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.

EasyTechnical
0 practiced
Compare spot/preemptible instances and on-demand instances for ML workloads. Which types of training or inference workloads are appropriate for spot instances? Describe failure modes and mitigation strategies.
MediumTechnical
0 practiced
You want to use spot instances for distributed training to reduce costs. Describe a concrete operational plan covering checkpoint frequency, orchestration changes (Kubernetes / job scheduler), handling partial preemptions, and how you'd test the approach to ensure no significant training slowdowns or data loss.
MediumSystem Design
0 practiced
Design a policy for scheduling off-peak batch jobs (data preprocessing, offline training) and archival strategies for old datasets. Include time windows, priority vs preemption, cold storage decisions (e.g., Glacier), and how to coordinate with production inference capacity to avoid interference.
EasyTechnical
0 practiced
What is dynamic batching for model inference? Explain how it balances throughput and latency, which systems/frameworks support it (e.g., TensorRT/TorchServe), and when you would avoid using it.
MediumTechnical
0 practiced
Discuss the performance and cost trade-offs between vertical scaling (bigger GPU instances) and horizontal scaling (more smaller GPUs) for inference workloads. Consider latency SLOs, batching efficiency, licensing or GPU memory-limited models, and scaling elasticity.

Unlock Full Question Bank

Get access to hundreds of Capacity Planning and Resource Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.