InterviewStack.io LogoInterviewStack.io

Disaster Recovery and Business Continuity Questions

Designing and maintaining plans, architectures, and processes to ensure service continuity and recoverability after major incidents or disasters. Topics include defining Recovery Time Objective and Recovery Point Objective, conducting business impact analysis and tiering services by criticality, dependency mapping and recovery ordering, selecting replication and backup strategies including synchronous and asynchronous replication, active active and active passive topologies, snapshots and transaction log based point in time recovery, and planning cold, warm, and hot recovery sites. Also covers failover and failback procedures, orchestration and automation of recovery workflows, runbook creation and stakeholder roles and communications, regular disaster recovery testing and exercises including tabletop, simulated failover, full recovery drills and chaos engineering, metrics tracking such as mean time to recovery and actual Recovery Time Objective achieved, off site and geographic redundancy considerations, cloud versus on premise trade offs, regulatory and data residency requirements, and postexercise reviews to close recovery gaps.

MediumTechnical
21 practiced
Given a budget scenario: monthly cost for hot-active multi-region setup = $150k, warm-standby = $50k, cold-site = $10k. The business expects losses of $200k/hour during downtime. For a service with an acceptable outage cost of up to $20k/hour, recommend the appropriate site model and justify your recommendation with quantitative reasoning.
MediumTechnical
25 practiced
Calculate the expected RPO if a system uses hourly backups plus WAL shipping every 5 minutes and an outage occurs 7 minutes after the last WAL shipping. Given that WALs are pushed asynchronously every 5 minutes, explain the worst-case data loss and how you'd change the design to achieve an RPO of under 1 minute.
HardSystem Design
23 practiced
Design a cross-cloud DR strategy for a SaaS provider where some customers require eventual failover to Cloud-A in Europe and others to Cloud-B in the US due to contractual obligations. Discuss data synchronization, identity and access management, network/topology, and how you would orchestrate failover and failback between clouds while preserving tenant isolation and compliance.
EasyTechnical
29 practiced
What essential elements must a technical recovery runbook contain to be actionable during an incident? List at least eight items (for example: purpose, prerequisites, steps, rollback, owners, contact lists) and explain why each is important for a cross-functional incident team.
MediumTechnical
45 practiced
Create a DR test plan for an enterprise application that includes tabletop exercises, automated simulated failover tests, and an annual full failover drill. For each test type provide objectives, participants, success criteria, and a sample schedule over a 12-month period.

Unlock Full Question Bank

Get access to hundreds of Disaster Recovery and Business Continuity interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.