Principles of database schema design and performance optimization including relational and non relational trade offs, normalization and denormalization, indexing strategies and index types, clustered and non clustered indexes, query execution plans, common table expressions for readable complex queries, detecting missing or redundant indexes, sharding and partitioning strategies, and consistency and availability trade offs. Candidates should demonstrate knowledge of optimizing reads and writes, diagnosing slow queries, and selecting the appropriate database model for scale and consistency requirements.
HardSystem Design
0 practiced
Your company's primary OLTP database is also receiving analytical queries for reporting, causing production performance degradation. Architect a solution to separate analytical workloads while preserving near-real-time analytics: compare options such as read replicas with async replication, streaming CDC into a data warehouse (BigQuery/Redshift), using an HTAP database, or using a lambda/kappa architecture. Discuss data freshness, resource isolation, cost, and implementation complexity.
HardTechnical
0 practiced
You observe a query plan suffering from massively wrong cardinality estimates leading to poor join choices. The data distribution is skewed and statistics are stale. Explain how you'd diagnose the problem (which system tables or tools to inspect) and enumerate concrete fixes: updating statistics, increasing stats_target, creating extended statistics for correlated columns, generating histograms, or using optimizer hints. Discuss pros and cons of each approach.
EasyTechnical
0 practiced
Explain ACID and BASE consistency models in the context of transactional and distributed databases. For each property (Atomicity, Consistency, Isolation, Durability) give a short definition and typical implementation considerations. Then explain what BASE (Basically Available, Soft state, Eventual consistency) means and provide two production scenarios where BASE is acceptable and two where ACID is required. Discuss how these models influence database selection and pipeline design.
HardTechnical
0 practiced
Design a performant SCD Type 2 implementation for a large dimension table customers_dim(customer_sk bigint PK, business_key text, name text, address text, valid_from date, valid_to date, is_current boolean). Provide PostgreSQL/ANSI SQL pseudocode to upsert a source customer record into the dimension: (a) insert new row if changed, (b) expire previous row by setting valid_to and is_current=false, and (c) handle first-time inserts efficiently. Explain indexing strategy and how to optimize for batch upserts.
MediumSystem Design
0 practiced
For a SaaS multi-tenant application, compare three multi-tenant schema approaches: (a) single shared table with tenant_id column, (b) separate schemas per tenant, and (c) separate databases per tenant. Discuss trade-offs for operational complexity, security/isolation, backup and restore, scaling, and query performance. Provide recommendations for small, medium, and large tenants and a migration path between models.
Unlock Full Question Bank
Get access to hundreds of Database Design and Query Optimization interview questions and detailed answers.