Data Partitioning and Sharding Questions

Techniques and operational practices for horizontally partitioning data across multiple database instances or storage nodes to achieve scale, improve performance, and manage growth. Includes selection and design of partition and shard keys to evenly distribute load and avoid hotspots, with range based, hash based, and directory based approaches and consistent hashing mechanisms. Covers handling uneven distribution and data skew, hotspot detection and mitigation, and the impact of partitioning on query patterns such as joins and cross shard queries. Explains implications for transactions and consistency, including transactional boundaries that span partitions and approaches to distributed transactions and compensation. Describes resharding and online data migration strategies, rolling rebalances, and methods to minimize downtime and data movement. Emphasizes operational concerns including shard management, automation, monitoring and alerting, failure recovery, and performance tuning. Discusses trade offs between simplicity, latency, throughput, and operational complexity and highlights considerations for both transactional and analytical workloads, including routing, caching, and coordination patterns.

EasyTechnical

0 practiced

Compare hash-based, range-based, and directory-based sharding approaches. For each, list the typical use-cases, primary trade-offs, and one example of a workload where that approach would be a poor fit.

EasyTechnical

0 practiced

Implement a simple Python function partition_for_key(key: str, num_shards: int) -> int that returns the shard id using a stable hash mod partitioning approach. Requirements: use a deterministic hash, support num_shards up to 10,000, and document any edge cases. Focus on correctness and clarity, not external dependencies.

MediumTechnical

0 practiced

How would you compute a global top-K (e.g., top 100 most purchased products) when underlying data is sharded across 200 nodes? Discuss algorithms and trade-offs between accuracy, latency, and resource usage.

MediumTechnical

0 practiced

How can a system enforce a globally unique constraint (e.g., unique usernames) when the data is sharded? Enumerate at least three approaches (synchronous and asynchronous), explain their consistency guarantees, latency implications, and complexity.

HardTechnical

0 practiced

A shard is receiving a malicious large number of requests for a small set of keys (DDoS/hot keys). Propose detection logic, immediate mitigations, and long-term architectural changes to reduce vulnerability to such attacks, including rate-limiting and sharding tactics.

Unlock Full Question Bank

Get access to hundreds of Data Partitioning and Sharding interview questions and detailed answers.

Join thousands of developers preparing for their dream job.