InterviewStack.io LogoInterviewStack.io

Disaster Recovery and Business Continuity Questions

Designing and maintaining plans, architectures, and processes to ensure service continuity and recoverability after major incidents or disasters. Topics include defining Recovery Time Objective and Recovery Point Objective, conducting business impact analysis and tiering services by criticality, dependency mapping and recovery ordering, selecting replication and backup strategies including synchronous and asynchronous replication, active active and active passive topologies, snapshots and transaction log based point in time recovery, and planning cold, warm, and hot recovery sites. Also covers failover and failback procedures, orchestration and automation of recovery workflows, runbook creation and stakeholder roles and communications, regular disaster recovery testing and exercises including tabletop, simulated failover, full recovery drills and chaos engineering, metrics tracking such as mean time to recovery and actual Recovery Time Objective achieved, off site and geographic redundancy considerations, cloud versus on premise trade offs, regulatory and data residency requirements, and postexercise reviews to close recovery gaps.

EasyTechnical
0 practiced
Explain the differences between cold, warm, and hot recovery sites. For each model, name one common use-case (type of application) and one key operational trade-off (cost, complexity, RTO/RPO). Assume the company runs a mix of public web services and batch analytics jobs.
HardSystem Design
0 practiced
You are designing DR for a petabyte-scale database where synchronous replication is impractical due to latency and cost. Propose a hybrid architecture that provides near-zero RTO for read operations and bounded RPO for writes. Discuss storage tiering, replication lag mitigation, and how to achieve fast failover for read-heavy load while preserving write durability.
HardTechnical
0 practiced
Design metrics and a dashboard to track disaster recovery readiness and performance for the executive team and engineering teams. Include at least 8 metrics (mix operational and business), data sources, target thresholds, and alerting rules. Explain which metrics are leading indicators vs lagging indicators.
HardTechnical
0 practiced
Create a chaos engineering experiment to validate disaster recovery readiness for a distributed microservices platform with a central database. Define hypothesis, blast radius, KPIs to observe (e.g., RTO, failed transaction rate), safety controls, and how to roll back the experiment. Include how you'd scale experiments from staging to limited production.
EasyTechnical
0 practiced
Describe three storage-consistency concerns when taking backups of a multi-tier application (e.g., web front-end, application servers, relational DB, cache). For each concern, propose a mitigation strategy to ensure recoverable, consistent backups across tiers.

Unlock Full Question Bank

Get access to hundreds of Disaster Recovery and Business Continuity interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.