InterviewStack.io LogoInterviewStack.io

Infrastructure and Database Systems Questions

Fundamental infrastructure and database engineering concepts relevant to analytics platforms and general backend systems. Topics include relational and non relational database architecture indexing strategies query optimization replication and consistency trade offs sharding and partitioning approaches caching systems design message queues and event streaming systems and how these components integrate to meet performance reliability and cost objectives. Candidates should be able to reason about capacity planning high availability disaster recovery backup strategies and operational concerns such as monitoring alerting and graceful degradation under load.

EasyTechnical
51 practiced
List and justify essential monitoring metrics you would instrument for a production PostgreSQL primary and its read replicas. Include metrics for performance (latency, slow queries), replication health (lag), capacity (connections, disk usage), errors, and any custom analytics-oriented signals you would add.
MediumSystem Design
31 practiced
Design a resharding strategy to move from 10 shards to 100 shards for a key-value store with minimal downtime. Outline data copy/streaming approach, client routing updates, consistent hashing considerations, throttling and progress monitoring, and rollback plan in case of failure.
MediumTechnical
32 practiced
Implement an in-memory LRU cache in Python with O(1) get and put operations. The cache should accept a capacity parameter, evict the least-recently-used item when full, and support optional TTL per key. You can use Python 3 standard libraries; include brief comments about concurrency considerations.
HardTechnical
25 practiced
Explain how to create consistent backups in a leaderless distributed database like Cassandra. Discuss coordinated snapshot approaches, commit-log or CDC archiving, repair and anti-entropy considerations, and how to reconstruct a consistent point-in-time view from per-node artifacts.
MediumTechnical
35 practiced
You observe increased p95 query latency for ad-hoc joins in a Redshift/BigQuery-like data warehouse. Walk through a step-by-step diagnostic and remediation plan: what metrics and system views would you examine, common causes such as distribution/sort keys, WLM queues, disk spills, statistics, and practical mitigations.

Unlock Full Question Bank

Get access to hundreds of Infrastructure and Database Systems interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.