Deployment Risk Management & Rollback Strategy Questions
Discuss strategies for managing deployment risk: canary deployments (detect issues in subset), feature flags (quick disable without rollback), smoke testing post-deployment. Understand rollback procedures: full rollback (restore previous version), partial rollback (revert specific services). Know how to handle complications like database schema changes that can't simply rollback.
MediumTechnical
0 practiced
You're on-call: a new deployment made via GitOps one hour ago has caused a spike in p99 latency for the payments service. Walk through immediate actions you take (tactical mitigation), communications you make, and how you decide whether to revert the Git commit vs apply a hotfix. Include short-term and medium-term follow-ups.
MediumTechnical
0 practiced
During a canary rollout (5% traffic), latency increased by 1.5x while error rate remains unchanged. As the on-call SRE, list the step-by-step diagnostic actions you would take to decide whether to continue rollout, pause, or rollback. Which logs, traces, metrics and infrastructure signals would you check, and in what order?
HardTechnical
0 practiced
As an SRE leader you must balance a fast-moving product team (nightly releases) and a conservative reliability team insisting on strict rollback gates. Propose a policy and process that balances velocity and reliability, covers enforcement mechanisms, and includes measurable success criteria.
HardSystem Design
0 practiced
Architect a deployment platform that supports strict availability targets (e.g., 99.999% uptime) and must handle schema changes that cannot be rolled back. The platform should support canaries, feature flags, traffic routing, SLO-driven gates, and rollback automation. Provide components, data flows, and how you would handle a failed schema migration.
MediumSystem Design
0 practiced
Design an automated canary-analysis system (a "canary judge") that integrates with CI/CD to decide whether to promote a canary to full production. Requirements: ingest metrics at scale, compare baseline vs canary over configurable windows, support custom metrics and thresholds, trigger automated rollback or promotion, and expose an API for CI. Provide component diagram and data flow at a high level.
Unlock Full Question Bank
Get access to hundreds of Deployment Risk Management & Rollback Strategy interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.