InterviewStack.io LogoInterviewStack.io

Database Architecture and Partitioning Questions

Design database architecture and partitioning strategies appropriate to workload and access patterns. Evaluate database types including relational and various NoSQL models, schema design and indexing strategies, and when to use a monolithic database versus sharding. Cover sharding approaches such as range based, hash based, consistent hashing, and directory based sharding, as well as replica topologies, read replicas, replication lag, and handling cross shard queries. Address operational concerns at scale: resharding, mitigating hot partitions, balancing data distribution, transactional and consistency guarantees, and the trade offs between availability, consistency, and partition tolerance. Include monitoring, migration strategies, and impact on application logic and joins.

EasyTechnical
56 practiced
Describe replica topologies commonly used for read scaling (single primary with read replicas, multi-primary, follower reads). As a data scientist responsible for near-real-time features, explain how replication lag can impact feature freshness and what operational signals you would monitor.
HardSystem Design
51 practiced
Design a schema and partitioning strategy for a clickstream event store used for sessionization. Requirements: write-heavy ingestion (millions events/min), efficient session assembly for nightly model training, and retention policy of 1 year. Explain file formats, partition granularity, and compaction considerations.
EasyTechnical
64 practiced
Explain range-based sharding versus hash-based sharding. Provide one clear example workload where range-based sharding is better and one where hash-based sharding is better. As a data scientist, why would the choice matter for analytic queries and model fairness?
MediumTechnical
50 practiced
A table experiences hot partitions because recent timestamps concentrate writes into the newest partition, causing write throughput failure. Propose three strategies (at storage and application level) to alleviate hot partitions for write-heavy time-series data and discuss their impact on read queries used for model features.
HardTechnical
46 practiced
You must reshard a terabyte-scale dataset because a hash distribution became skewed after a marketing campaign created uneven key distribution. Describe algorithms and steps for rebalancing data dynamically with minimal downtime, including how to ensure idempotency and resume after partial failures. Estimate required network IO and time complexity considerations.

Unlock Full Question Bank

Get access to hundreds of Database Architecture and Partitioning interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.