InterviewStack.io LogoInterviewStack.io

Capacity Planning and Resource Optimization Questions

Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.

MediumSystem Design
0 practiced
Design an autoscaling policy for a Kubernetes Deployment serving HTTP traffic with diurnal patterns and occasional spikes. Specify the metrics to use (CPU, custom queue length, request latency), thresholds, target utilization, cooldowns, minimum and maximum replicas, and how you would combine reactive and predictive components to meet a p95 latency SLO while minimizing cost.
EasyTechnical
0 practiced
Explain what an autoscaler cooldown period is and why cooldowns are necessary. Describe one scenario where a cooldown that is too short leads to oscillations and another scenario where a cooldown that is too long causes SLA violations. Finally, outline a simple method for choosing an appropriate cooldown for a web application.
MediumTechnical
0 practiced
You manage real-time ML model serving: 1,000 inference requests/sec with a p95 latency SLO of 200ms. Discuss practical capacity and optimization approaches including model compression (quantization, pruning), batching strategies, CPU vs GPU serving, autoscaling, cold-start mitigation, profiling, and cost trade-offs to achieve SLOs while minimizing infrastructure spend.
HardTechnical
0 practiced
Explain how buffer pool sizing in an OLTP database affects read latency and IO amplification when the working set is slightly larger than available RAM. Using cache-miss curves and cost modeling, propose a method to choose buffer size that minimizes total cost (memory cost + IO cost), and describe experiments to measure the 'knee' in the hit-rate curve.
EasyTechnical
0 practiced
You operate a microservice that handles 500 requests/sec. Each request consumes 0.05 CPU cores and 20 MB memory. You want per-instance p95 CPU utilization <= 60% and 20% memory headroom. Implement (describe or write) a Python function signature compute_instances(requests_per_sec, cpu_per_request, mem_per_request_mb, cpu_target_util, mem_headroom_pct) that returns the integer number of instances required. Explain the calculation and rounding strategy.

Unlock Full Question Bank

Get access to hundreds of Capacity Planning and Resource Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.