InterviewStack.io LogoInterviewStack.io

High Availability and Disaster Recovery Questions

Designing systems to remain available and recoverable in the face of infrastructure failures, outages, and disasters. Candidates should be able to define and reason about Recovery Time Objective and Recovery Point Objective targets and translate service level agreement goals such as 99.9 percent to 99.999 percent into architecture choices. Core topics include redundancy strategies such as N plus one and N plus two, active active and active passive deployment patterns, multi availability zone and multi region topologies, and the trade offs between same region high availability and cross region disaster recovery. Discuss load balancing and traffic shaping, redundant load balancer design, and algorithms such as round robin, least connections, and consistent hashing. Explain failover detection, health checks, automated versus manual failover, convergence and recovery timing, and orchestration of failover and reroute. Cover backup, snapshot, and restore strategies, replication and consistency trade offs for stateful components, leader election and split brain mitigation, runbooks and recovery playbooks, disaster recovery testing and drills, and cost and operational trade offs. Include capacity planning, autoscaling, network redundancy, and considerations for security and infrastructure hardening so that identity, key management, and logging remain available and recoverable. Emphasize monitoring, observability, alerting for availability signals, and validation through chaos engineering and regular failover exercises.

MediumTechnical
71 practiced
Compare synchronous and asynchronous replication for stateful components such as databases and message queues. As a Systems Administrator, recommend which approach to use for a financial ledger system and for a large object store (user media), and explain performance, availability, and RPO/RTO trade-offs.
MediumSystem Design
119 practiced
You are a Systems Administrator asked to design an HA and DR architecture for a customer-facing web application that must meet an SLA of 99.99% and an RPO of under 5 minutes. Describe the components, redundancy strategy, storage and database replication approach, failover mechanics, and trade-offs you would choose to meet these targets.
MediumTechnical
76 practiced
You're configuring redundant load balancers for an HA service across two AZs. Discuss placement options (edge vs internal), redundancy models (active-passive vs active-active), state and algorithm choices when backends scale rapidly, and the operational considerations of using consistent hashing to preserve cache locality during scale events.
EasyTechnical
83 practiced
Describe the differences between active-active and active-passive deployment patterns for ensuring availability. As a Systems Administrator, list three operational trade-offs (cost, complexity, consistency) and give one concrete example of when active-passive is preferable over active-active.
HardTechnical
85 practiced
Your startup has limited budget but requires reasonable uptime for its SaaS product. Propose a pragmatic HA/DR plan to balance cost and availability: when to use multi-AZ versus cross-region DR, backup frequency, RTO/RPO targets, and operational practices (runbooks, automation, testing) to keep costs low while ensuring recoverability.

Unlock Full Question Bank

Get access to hundreds of High Availability and Disaster Recovery interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.