InterviewStack.io LogoInterviewStack.io

Capacity Planning and Resource Optimization Questions

Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.

EasyTechnical
20 practiced
What is 'right-sizing' in cloud capacity planning for ML workloads? Describe a simple, repeatable process to right-size a VM or container for a batch training job using historical CPU, memory, and wall-clock runtime metrics. Include how you would decide on a safety buffer and how to validate the right-sizing change.
EasyTechnical
24 practiced
In Kubernetes, what is the difference between resource requests and limits? Explain how requests and limits affect bin-packing, QoS classes, and OOM killing. Provide an example set of CPU and memory request/limit values for a small model-serving container and justify your choices.
HardTechnical
25 practiced
Create a strategy and test plan to compress model artifacts stored in a model registry and measure trade-offs between storage savings and retrieval latency for 100k downloads per month. Include compression formats, partial loading strategies, caching layers, and capacity planning for metadata DB and storage tiers.
MediumTechnical
23 practiced
Write a Python function that estimates the number of replicas required for an inference service given desired throughput T (req/s), per-replica max_concurrency C (requests served concurrently), and a safety buffer B (expressed as a decimal, e.g., 0.2 for 20%). Assume linear scaling and no queuing delays. Function signature: def estimate_replicas(T, C, B): return int. Include handling of edge cases and brief explanation of assumptions.
EasyTechnical
28 practiced
Explain the key resource utilization metrics you would monitor to ensure machine learning models in production are healthy. For a typical model-serving pod or container, list metrics for CPU, memory, disk I/O, network, and GPU (if applicable). For each metric: explain why it matters, what an early-warning threshold might look like, and one alert that you would configure to detect a degradation before SLOs are impacted.

Unlock Full Question Bank

Get access to hundreds of Capacity Planning and Resource Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.