InterviewStack.io LogoInterviewStack.io

Database Architecture and Partitioning Questions

Design database architecture and partitioning strategies appropriate to workload and access patterns. Evaluate database types including relational and various NoSQL models, schema design and indexing strategies, and when to use a monolithic database versus sharding. Cover sharding approaches such as range based, hash based, consistent hashing, and directory based sharding, as well as replica topologies, read replicas, replication lag, and handling cross shard queries. Address operational concerns at scale: resharding, mitigating hot partitions, balancing data distribution, transactional and consistency guarantees, and the trade offs between availability, consistency, and partition tolerance. Include monitoring, migration strategies, and impact on application logic and joins.

HardTechnical
0 practiced
Create a monitoring plan with specific metrics, SLOs, and dashboard panels to ensure an online feature store meets a p95 read latency SLA of 30ms. Include how to differentiate between database, network, and application causes when p95 is breached and what automated mitigations you would apply.
MediumTechnical
0 practiced
Design a monitoring and alerting plan for a feature ingestion pipeline that writes data into partitioned tables used for training and online serving. Define at least five alerts (with thresholds) for replication lag, partition skew, ingestion failures, and query latency, and explain the actions triggered by each alert.
MediumSystem Design
0 practiced
Design a replica topology for a global ML prediction service that needs low-latency reads in three continents while ensuring model updates propagate quickly enough for acceptable staleness. Discuss leader placement, replication mode (synchronous/asynchronous), and strategies to handle read-after-write consistency for critical features.
HardTechnical
0 practiced
How would you detect and mitigate silent data corruption or split-brain scenarios in a replicated database feeding ML training jobs? Propose detection mechanisms, automated mitigation steps, and offline repair procedures that preserve model training correctness.
EasyTechnical
0 practiced
You're evaluating whether to keep a monolithic database or move to sharding for the company's training dataset used by ML models. The dataset is 5 TB and grows 50% annually; queries are mix of batch training and interactive feature exploration. As a data scientist, list the decision criteria and recommend which approach to take, explaining trade-offs for model development and serving.

Unlock Full Question Bank

Get access to hundreds of Database Architecture and Partitioning interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.