Scaling Fundamentals and Concepts Questions

Core concepts required to reason about scaling decisions and to communicate clear approaches. Topics include the difference between vertical and horizontal scaling and their trade offs; stateless versus stateful service design and why statelessness enables horizontal scaling; basic load balancing and request distribution strategies; when and how to apply caching replication and partitioning; simple autoscaling concepts and common metrics used to trigger scaling; how to identify common bottlenecks and apply pragmatic mitigations; and fundamental trade offs between latency throughput cost and complexity. This topic tests conceptual clarity and the ability to map requirements to simple scaling approaches.

MediumTechnical

58 practiced

Explain service-layer partitioning/sharding strategies (by user id, tenant, geographic region). Describe routing approaches to direct requests to partitions, how to detect and mitigate hot shards, and operational steps for re-sharding or migrating partitions with minimal downtime.

MediumTechnical

70 practiced

Explain consistent hashing and how it reduces re-mapping when nodes are added/removed. Describe the role of virtual nodes, give pseudocode for mapping a key to a node, and list scenarios where consistent hashing might still produce hot spots.

EasyTechnical

56 practiced

Explain the difference between vertical (scale-up) and horizontal (scale-out) scaling for services. For each approach list concrete advantages, disadvantages, failure domains, cost and operational implications, and provide two real-world scenarios where you as an SRE would choose one approach over the other.

MediumTechnical

70 practiced

Implement a token bucket rate limiter in Python. The limiter should support parameters: capacity (max_tokens), refill_rate (tokens per second), and a method allow(n=1) that returns True if tokens were available and consumes them, otherwise False. Provide single-process code and explain how this design would need to change for a distributed deployment.

HardTechnical

114 practiced

Design a distributed sliding-window rate limiter using Redis. Describe the data structures/commands used and provide clear pseudocode for allow_request(user_id, limit, window_seconds). Explain correctness under concurrent requests and how to reduce memory/storage overhead for many users.

Unlock Full Question Bank

Get access to hundreds of Scaling Fundamentals and Concepts interview questions and detailed answers.

Join thousands of developers preparing for their dream job.