InterviewStack.io LogoInterviewStack.io

Debugging and Recovery Under Pressure Questions

Covers systematic approaches to finding and fixing bugs during time pressured situations such as interviews, plus techniques for verifying correctness and recovering gracefully when an initial approach fails. Topics include reproducing the failure, isolating the minimal failing case, stepping through logic mentally or with print statements, and using binary search or divide and conquer to narrow the fault. Emphasize careful assumption checking, invariant validation, and common error classes such as off by one, null or boundary conditions, integer overflow, and index errors. Verification practices include creating and running representative test cases: normal inputs, edge cases, empty and single element inputs, duplicates, boundary values, large inputs, and randomized or stress tests when feasible. Time management and recovery strategies are covered: prioritize the smallest fix that restores correctness, preserve working state, revert to a simpler correct solution if necessary, communicate reasoning aloud, avoid blind or random edits, and demonstrate calm, structured troubleshooting rather than panic. The goal is to show rigorous debugging methodology, build trust in the final solution through targeted verification, and display resilience and recovery strategy under interview pressure.

MediumTechnical
0 practiced
You're in a live incident and have two straightforward options: rollback the latest deploy or toggle a feature flag that should revert behavior. Walk through your decision-making process: how you assess risk, preserve data and logs, perform the rollback or toggle, and verify that service behavior is restored. Include communication steps and how you would avoid causing more disruption.
HardTechnical
0 practiced
A Go microservice uses a global map to count events and occasionally crashes or produces wrong counts under high concurrency. Given this code snippet, identify the issue, fix it, and write a small test that reproduces the race:
go
var counts = map[string]int{}
func Increment(k string) {
    counts[k]++
}
Explain the trade-offs of your chosen fix.
MediumTechnical
0 practiced
Write a Python script that parses an HTTP access log and identifies per-user request sequences that lead to an HTTP 500 within 30 seconds. For each user_id, output the minimal contiguous request sequence that ends with a 500. Describe your algorithm and edge cases (clock drift, missing user_id, concurrent sessions).
EasyTechnical
0 practiced
You're paged for an IndexError seen in production logs. Below is a simplified Python function reported to crash intermittently. Identify the bug, provide the minimal code change to fix it, and describe one or two targeted tests (edge cases) you'd add.
python
def process(items):
    res = []
    for i in range(len(items)):
        x = items[i+1]
        res.append(x * 2)
    return res
Explain why the crash happens and why your change is minimal and safe.
MediumTechnical
0 practiced
You have 30 minutes to validate a risky patch under time pressure. Describe a practical 'fast-check' strategy: which tests and assertions to run, how to select representative inputs, what smoke checks and monitoring to enable, and how to package an emergency rollback if the patch causes regressions.

Unlock Full Question Bank

Get access to hundreds of Debugging and Recovery Under Pressure interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.