InterviewStack.io LogoInterviewStack.io

Capacity Planning and Resource Optimization Questions

Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.

MediumTechnical
0 practiced
Describe metrics and experiments you would run to determine whether a recent capacity change (e.g., resizing instances, changing instance counts) met both performance and cost objectives over a 30-day evaluation period. Which statistical tests, dashboards, and KPIs (SLO adherence, cost per request, latency percentiles, utilization) would you use to declare success and detect regressions?
MediumTechnical
0 practiced
Explain how to translate SLOs into concrete capacity targets. Given a requirement of 99.95% availability and p95 latency under 100ms, explain how you'd set CPU/memory headroom, instance counts, redundancy levels (N+1/N+2), and load balancing choices to meet SLOs including failure scenarios such as instance or AZ loss.
HardTechnical
0 practiced
Your service's network egress quota is occasionally hit, causing provider throttling and availability loss. Propose methods to optimize network throughput and stay within quotas: protocol-level changes, response compression, request aggregation, caching/CDN usage, prioritization/traffic-shaping, and batching. Also explain monitoring and backpressure strategies and the cost implications of each approach.
EasyTechnical
0 practiced
Describe the differences between horizontal and vertical scaling for cloud infrastructure. For each approach list pros and cons, provide three realistic scenarios where one is strongly preferable to the other, and explain practical limits that make one approach impossible or impractical in certain situations.
HardTechnical
0 practiced
A proposed capacity reduction increases CPU p95 by 30% while keeping p50 unchanged. Propose a rigorous method to quantify how likely this change will consume the monthly error budget for availability and latency. Include how you would design canary experiments, statistical modeling to extrapolate risk, and conservative decision criteria for proceeding or rolling back.

Unlock Full Question Bank

Get access to hundreds of Capacity Planning and Resource Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.