InterviewStack.io LogoInterviewStack.io

Infrastructure Scaling and Capacity Planning Questions

Operational and infrastructure level planning to ensure systems meet current demand and projected growth. Topics include forecasting demand headroom planning and three to five year capacity roadmaps; autoscaling policies and metrics driven scaling using central processing unit memory and custom application metrics; load testing benchmarking and performance validation methodologies; cost modeling and right sizing in cloud environments and trade offs between managed services and self hosted solutions; designing non disruptive upgrade and migration strategies; multi region and availability zone deployment strategies and implications for data placement and latency; instrumentation and observability for capacity metrics; and mapping business growth projections into infrastructure acquisition and scaling decisions. Candidates should demonstrate how to translate requirements into capacity plans and how to validate assumptions with experiments and measurements.

HardSystem Design
59 practiced
Design a sharding strategy for a high-write 'user-events' database. Explain how you choose a shard key, how to handle uneven distribution and hot shards, how to route queries, and outline a re-sharding plan with minimal downtime and data correctness guarantees.
MediumTechnical
76 practiced
Propose a tiered storage policy for logs and metrics where storage costs $X per TB/month. Include policies for hot/warm/cold tiers, retention windows, downsampling strategy for older data, archival to cheaper stores, and how to ensure compliance and alerting still work when raw data is moved or downsampled.
HardSystem Design
75 practiced
Architect a system to handle promotional traffic spikes that reach 100x baseline for short durations without keeping 100x capacity provisioned. Describe autoscaling strategy, buffer/pool strategies, caching and CDN pre-warm, queueing/backpressure, graceful degradation (feature gating), and cost controls. Define the metrics and acceptance criteria to validate the architecture.
MediumTechnical
58 practiced
Implement a Python function (or pseudocode) desired_replicas(metrics: List[dict], target_cpu: float, min_replicas: int, max_replicas: int, cooldown_seconds: int) -> int. Each metric entry contains timestamp, cpu_percent, request_rate. Use an exponential moving average over the last N samples to smooth CPU, respect cooldown by comparing last scaling event time, and clamp result to min/max. Explain complexity and edge cases.
MediumTechnical
63 practiced
Explain the differences between Kubernetes Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler (or Karpenter). How do stabilization-window, target utilization, and cooldown/stabilization affect real-world behavior and trade-offs when running production workloads?

Unlock Full Question Bank

Get access to hundreds of Infrastructure Scaling and Capacity Planning interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.