InterviewStack.io LogoInterviewStack.io

Load Balancing and Horizontal Scaling Questions

Covers principles and mechanisms for distributing traffic and scaling services horizontally. Includes load balancing algorithms such as round robin, least connections, and consistent hashing; health checks, connection draining, and sticky sessions; and session management strategies for stateless and stateful services. Explains when to scale horizontally versus vertically, capacity planning, and trade offs of each approach. Also includes infrastructure level autoscaling concepts such as auto scaling groups, launch templates, target tracking and step scaling policies, and how load balancers and autoscaling interact to absorb traffic spikes. Reviews different load balancer types and selection criteria, integration with service discovery, and operational concerns for maintaining availability and performance at scale.

EasyTechnical
0 practiced
Create a high-level capacity planning checklist and estimation approach for a stateless web service expected to handle 5,000 requests/sec with p95 latency target of 200ms. Include how to benchmark per-instance throughput, calculate required instance counts, account for autoscaling buffer and headroom, load balancer capacity, and database/backend dependencies that affect sizing.
EasyTechnical
0 practiced
Explain the main differences between horizontal and vertical scaling. Provide concrete examples (e.g., adding CPU/RAM to an existing VM vs. adding more instances behind a load balancer) and discuss operational trade-offs including downtime, complexity, cost, single point of failure, and the impact on stateful components such as databases and caches. Give scenarios where vertical scaling is still the right choice and describe migration considerations from vertical to horizontal.
MediumTechnical
0 practiced
Explain how service discovery integrates with load balancers in microservices architectures. Compare DNS-based discovery, client-side discovery with registries (Consul/Eureka), and server-side discovery with gateways or service meshes. For each pattern describe how instances register/deregister, how the LB learns about backends, and how to handle TTLs and stale entries.
EasyTechnical
0 practiced
Describe how the following load balancing algorithms work and when to choose each: round-robin, least-connections, and consistent hashing (IP-hash). For each algorithm explain pros and cons, how they behave with long-lived connections, heterogeneous backend capacities, and cache-oriented systems, and give at least one realistic use case where it's the preferred choice.
MediumTechnical
0 practiced
Estimate the number of server instances and load balancer capacity needed for a streaming ingestion API that accepts 1,000 concurrent TCP connections, each sending 5KB/sec on average. Show calculations for aggregate bandwidth, headroom for spikes, per-connection CPU/memory assumptions (e.g., 0.5 KB/s -> 1% CPU), and how you would size LB and network capacity. State your assumptions explicitly.

Unlock Full Question Bank

Get access to hundreds of Load Balancing and Horizontal Scaling interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.

Load Balancing and Horizontal Scaling Interview Questions | InterviewStack | InterviewStack.io