InterviewStack.io LogoInterviewStack.io

Systematic Troubleshooting and Debugging Questions

Covers structured methods for diagnosing and resolving software defects and technical problems at the code and system level. Candidates should demonstrate methodical debugging practices such as reading and reasoning about code, tracing execution paths, reproducing issues, collecting and interpreting logs metrics and error messages, forming and testing hypotheses, and iterating toward root cause. Topic includes use of diagnostic tools and commands, isolation strategies, instrumentation and logging best practices, regression testing and validation, trade offs between quick fixes and long term robust solutions, rollback and safe testing approaches, and clear documentation of investigative steps and outcomes.

HardTechnical
0 practiced
Given a flamegraph that shows a hot function path in your service's request handler, describe concrete code and configuration changes you might make to reduce CPU usage. Explain how you would measure and validate the improvement and ensure you don't regress tail latency or throughput.
MediumTechnical
0 practiced
A pod in Kubernetes is in CrashLoopBackOff. Detail the kubectl commands and investigative steps you would run to determine whether the crash is due to application error, image problems, misconfigured liveness/readiness probes, or node resource exhaustion. Include how to collect logs and events and an approach to repro locally.
HardTechnical
0 practiced
You have a core dump from a crashed native service. Describe the steps you would take to analyze it: how to obtain matching symbols, load the core into gdb, inspect threads/stack frames/heap, and identify likely root causes. Mention common pitfalls such as mismatched binaries or stripped symbols.
HardTechnical
0 practiced
Implement (or provide runnable pseudocode) a command-line Python tool that reads newline-delimited JSON logs from stdin, groups log events by trace_id, and prints latency percentiles (p50, p95, p99) for each trace_id as streaming output. The tool should handle an unbounded stream and limit memory usage by expiring trace groups after 10 minutes of inactivity.
MediumTechnical
0 practiced
Your product team asks to release a new feature tomorrow, but the service is nearing its error budget. Describe the decision process to choose between (A) quickly patching a small bug to reduce errors, (B) delaying the release to implement a full fix, or (C) proceed with release and mitigate via feature flags. Explain stakeholders you'd consult and trade-offs considered.

Unlock Full Question Bank

Get access to hundreds of Systematic Troubleshooting and Debugging interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.