Database Selection and Trade Offs Questions

How to evaluate and choose data storage systems and architectures based on workload characteristics and business constraints. Coverage includes differences between relational and nonrelational families such as document stores, key value stores, wide column stores, graph databases, time series databases, and search engines; mapping query patterns and latency requirements to storage options; trade offs between strong consistency and eventual consistency and their impact on availability and complexity; partition key design, replication strategies, and high availability considerations; operational concerns including backups, monitoring, vendor and cost trade offs, migration or hybrid strategies, and when to adopt polyglot persistence. Senior level discussion includes selecting specific managed services and reasoning about expected load patterns, failure modes, and operational burden.

MediumTechnical

42 practiced

Compare managed Redshift, Snowflake, and BigQuery for storing 1 PB compressed analytical data with monthly batch reports and bursty interactive queries. Consider storage vs compute separation, concurrency scaling, cost predictability, data egress, and integration with ETL/BI tools.

MediumTechnical

62 practiced

Compare leader-follower, multi-master, and quorum-based replication strategies. For each approach explain the impact on read/write latency, conflict resolution complexity, and which workloads (analytics vs transactional) are best suited to each.

HardTechnical

64 practiced

Implement a Python module (pseudocode OK) for streaming deduplication: read events from Kafka, deduplicate by event_id within a 10-minute sliding window, and emit unique events downstream. Describe your approach to in-memory state management, persistence for restart, scaling across consumers, and handling late arrivals.

EasyTechnical

43 practiced

You're choosing between a columnar data warehouse (Redshift, BigQuery, ClickHouse) and a row-oriented OLTP database for analyzing event logs and building ML features. Describe the factors you would use (query shape, latency, ingestion characteristics, storage cost, concurrency) and give a recommendation for nightly model training versus low-latency feature lookups.

EasyTechnical

37 practiced

Explain the CAP theorem in the context of distributed databases. For a geo-distributed analytics platform and for a customer-facing checkout service, which guarantees (consistency, availability, partition tolerance) would you prioritize and why? Give concrete examples of trade-offs you might accept in each use case.

Unlock Full Question Bank

Get access to hundreds of Database Selection and Trade Offs interview questions and detailed answers.

Join thousands of developers preparing for their dream job.