InterviewStack.io LogoInterviewStack.io

Infrastructure Scaling and Capacity Planning Questions

Operational and infrastructure level planning to ensure systems meet current demand and projected growth. Topics include forecasting demand headroom planning and three to five year capacity roadmaps; autoscaling policies and metrics driven scaling using central processing unit memory and custom application metrics; load testing benchmarking and performance validation methodologies; cost modeling and right sizing in cloud environments and trade offs between managed services and self hosted solutions; designing non disruptive upgrade and migration strategies; multi region and availability zone deployment strategies and implications for data placement and latency; instrumentation and observability for capacity metrics; and mapping business growth projections into infrastructure acquisition and scaling decisions. Candidates should demonstrate how to translate requirements into capacity plans and how to validate assumptions with experiments and measurements.

HardSystem Design
68 practiced
Design autoscaling and capacity management for a multi-region real-time ingestion pipeline where data must be pre-processed in the nearest region but certain global aggregates must converge and be consistent within five minutes. Discuss local versus global scaling triggers, cross-region replication architecture, network cost and bandwidth implications, leader election for global aggregation, and how to respond when a single region experiences an extreme surge.
HardSystem Design
56 practiced
Architect a cross-region data placement and replication strategy that provides low-latency reads for EU and US customers while meeting data residency (GDPR-like) constraints and minimizing cross-region egress costs. Discuss strategies for selective replication, partitioning, encryption and KMS key separation, access control, and logging/audit to prove compliance.
MediumTechnical
59 practiced
Estimate monthly cloud costs for an ETL pipeline with the following: S3 ingest of 5 TB/day with 30-day retention, EMR nightly transforms running 4 hours/day on 200 m3.xlarge instances, and a Redshift analytics cluster holding 1 TB compressed. Describe how to model storage, compute, network, and data-transfer costs; list explicit assumptions to communicate; and explain how to present uncertainty ranges and levers for optimization.
EasyTechnical
63 practiced
A product manager projects 30% quarter-over-quarter traffic growth for an ingestion API. Describe a repeatable process and simple formula you would use to translate that business projection into compute, storage, and network capacity requirements for the next 12 months. Include required inputs, assumptions about retention and replication, and how you would present uncertainty bands.
MediumTechnical
55 practiced
For an analytical database that frequently runs complex joins and heavy aggregations, discuss trade-offs between scaling up (bigger instances) and scaling out (adding nodes/horizontal scaling). Consider single-query performance, concurrency, cost per query, operational complexity, and failure domains, then recommend when to prefer each approach.

Unlock Full Question Bank

Get access to hundreds of Infrastructure Scaling and Capacity Planning interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.