InterviewStack.io LogoInterviewStack.io

Database Fundamentals and Storage Engines Questions

Core principles and components of data storage and persistence systems. This includes storage engine architectures and how they affect query processing and performance; transactions and isolation including atomicity, consistency, isolation, and durability; concurrency control and isolation levels; indexing strategies and how indexes affect read and write amplification; physical versus logical storage and object, block, and file storage characteristics; caching layers and cache invalidation patterns; replication basics and how replication affects durability and read performance; backup and recovery techniques including snapshots and point in time recovery; trade offs captured by consistency, availability, and partition tolerance reasoning; compression, cost versus performance trade offs, data retention, archival, and compliance concerns. Candidates should be able to reason about durability, persistence guarantees, operational recovery, and storage choices that affect latency, throughput, and cost.

MediumTechnical
50 practiced
Explain write amplification in LSM-tree based storage engines. How does compaction create write amplification, how does it affect SSD endurance, and what operational metrics would you collect to measure it? Describe mitigation strategies at both the algorithm level (compaction policy, bloom filters) and the operational level (SSD overprovisioning, scheduling compactions).
HardTechnical
41 practiced
Production Postgres write latencies spike and fsync appears to be the bottleneck. Explain the possible root causes (fsync frequency, small writes, group commit misconfiguration, disk controllers, write barriers, virtualization), and propose a prioritized set of mitigations ranging from DB configuration changes (synchronous_commit, wal_buffers), OS tuning (I/O scheduler, noatime), to hardware fixes (NVMe, battery-backed write cache). Include how you'd measure the impact of each change safely.
EasyTechnical
50 practiced
Describe common transaction isolation levels: read uncommitted, read committed, repeatable read, and serializable. For each level list which anomalies they permit (dirty read, non-repeatable read, phantom reads, write skew). Which isolation level does PostgreSQL provide by default and what guarantees does it offer in practice for concurrent readers and writers?
MediumSystem Design
47 practiced
Design a backup and restore strategy for a 10TB PostgreSQL database with sustained write throughput and constrained cross-region bandwidth. Requirements: daily full or incremental backups, 30-day retention, ability to do PITR to any second within the last 24 hours, and testable restores. Include storage targets, retention policy, snapshot coordination, and considerations for restore time and network cost.
HardTechnical
43 practiced
A WAL segment is corrupted and PostgreSQL fails to recover, preventing startup. As the on-call SRE, outline the step-by-step recovery plan to get the service back with minimal data loss: options include restoring from last good base backup and WAL, attempting to salvage partial WAL, using pg_resetwal as last resort, and validating data integrity post-recovery. Discuss how you'd communicate impact to stakeholders and plan a postmortem to prevent recurrence.

Unlock Full Question Bank

Get access to hundreds of Database Fundamentals and Storage Engines interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.