InterviewStack.io LogoInterviewStack.io

Database Fundamentals and Storage Engines Questions

Core principles and components of data storage and persistence systems. This includes storage engine architectures and how they affect query processing and performance; transactions and isolation including atomicity, consistency, isolation, and durability; concurrency control and isolation levels; indexing strategies and how indexes affect read and write amplification; physical versus logical storage and object, block, and file storage characteristics; caching layers and cache invalidation patterns; replication basics and how replication affects durability and read performance; backup and recovery techniques including snapshots and point in time recovery; trade offs captured by consistency, availability, and partition tolerance reasoning; compression, cost versus performance trade offs, data retention, archival, and compliance concerns. Candidates should be able to reason about durability, persistence guarantees, operational recovery, and storage choices that affect latency, throughput, and cost.

HardSystem Design
50 practiced
Design a point-in-time recovery (PITR) architecture for a multi-tenant BI system with 50 TB of data. Requirements: support restores to any second in the last 14 days, ensure minimal impact on production performance, and allow tenant-scoped restores without restoring the entire cluster. Outline WAL/log shipping, snapshot layering, metadata mappings, and an efficient restore workflow.
MediumSystem Design
40 practiced
Design a data lifecycle policy for BI storage that satisfies a legal retention of 7 years, requires encrypted-at-rest storage, and aims to minimize cost while allowing occasional rehydration for audit. Include tiers (hot/warm/cold/deep-archive), retention and deletion mechanics, encryption and key management considerations, and how you would automate compliance reporting.
MediumTechnical
51 practiced
A cross-functional team needs a single source-of-truth metric for 'active users' that is used in executive dashboards. Discuss how storage choices (hot transactional DB vs aggregated materialized view in a warehouse vs streaming pre-aggregation) affect the accuracy, latency, and cost of that metric. Propose an implementation that balances freshness and reliability for leadership metrics.
HardTechnical
47 practiced
Given a slow analytical query and the EXPLAIN ANALYZE output showing a nested-loop join with 10M inner rows, propose a full plan to rewrite the query or change indexes to achieve an order-of-magnitude speedup. Use example SQL:
SELECT c.country, SUM(o.total) FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.created_at >= '2025-01-01' GROUP BY c.country;
Explain possible rewritten queries, indexes, and expected plan changes.
EasyTechnical
43 practiced
Describe the difference between logical and physical storage in a database. Give concrete examples of logical constructs (tables, schemas, indexes, views) and physical constructs (pages/blocks, files, objects, containers). Explain how choosing block-based storage vs object storage (e.g., EBS vs S3) affects BI query latency, throughput, and cost for a reporting workload.

Unlock Full Question Bank

Get access to hundreds of Database Fundamentals and Storage Engines interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.