Infrastructure Scaling and Capacity Planning Questions

Operational and infrastructure level planning to ensure systems meet current demand and projected growth. Topics include forecasting demand headroom planning and three to five year capacity roadmaps; autoscaling policies and metrics driven scaling using central processing unit memory and custom application metrics; load testing benchmarking and performance validation methodologies; cost modeling and right sizing in cloud environments and trade offs between managed services and self hosted solutions; designing non disruptive upgrade and migration strategies; multi region and availability zone deployment strategies and implications for data placement and latency; instrumentation and observability for capacity metrics; and mapping business growth projections into infrastructure acquisition and scaling decisions. Candidates should demonstrate how to translate requirements into capacity plans and how to validate assumptions with experiments and measurements.

MediumTechnical

0 practiced

How would you detect and mitigate 'noisy neighbor' issues in a shared Kubernetes cluster where one tenant's workload causes CPU and IO spikes that affect other tenants? Describe detection signals, mitigation mechanisms (resource quotas, QoS classes, node isolation), and operational practices to prevent recurrence.

MediumTechnical

0 practiced

Compare Cluster Autoscaler, Karpenter, and cloud provider Autoscaling Groups (ASG) for Kubernetes node provisioning. Discuss differences in provisioning latency, bin-packing efficiency, multi-instance-type support, ease of configuration, and suitability for large clusters with mixed workloads (batch, long-running, spot instances).

EasyTechnical

0 practiced

How do you validate that autoscaling is actually saving money and not degrading reliability? List the metrics, experiments (A/B, shadow, canary), and dashboards you would set up to quantify cost per successful user action and observe impact on SLOs.

MediumTechnical

0 practiced

Implement a Python function (or pseudocode) desired_replicas(metrics: List[dict], target_cpu: float, min_replicas: int, max_replicas: int, cooldown_seconds: int) -> int. Each metric entry contains timestamp, cpu_percent, request_rate. Use an exponential moving average over the last N samples to smooth CPU, respect cooldown by comparing last scaling event time, and clamp result to min/max. Explain complexity and edge cases.

HardSystem Design

0 practiced

Design an end-to-end experiment to validate a Kubernetes cluster autoscaler's behavior at scale (e.g., supporting 10,000 pods). Include how you would generate load, inject realistic pod start times and custom metrics, simulate node provisioning delays and failures, collect relevant telemetry, and define pass/fail criteria for scaling latency and stability.

Unlock Full Question Bank

Get access to hundreds of Infrastructure Scaling and Capacity Planning interview questions and detailed answers.

Join thousands of developers preparing for their dream job.