InterviewStack.io LogoInterviewStack.io

Performance Debugging and Latency Investigation Questions

Finding the root cause of latency spikes: checking CPU/memory/disk/network utilization, profiling applications, querying slow logs, and identifying bottlenecks. Understanding the difference between resource exhaustion and an algorithmic problem. Using monitoring and tracing tools to narrow down where time is spent.

MediumTechnical
67 practiced
Implement exponential backoff with full jitter for client retries in either Go or Python. Your implementation should accept a base delay, max delay, number of attempts, and a jitter strategy, and must avoid synchronized retries that cause thundering herd. Provide code and explain why your jitter choice reduces coordination.
MediumTechnical
71 practiced
You observe only tail latency (p99) rising, while average CPU and memory metrics across hosts are unchanged. Application logs show an increase in request retries. Describe a diagnostic approach to confirm retry amplification and outline three immediate and two long-term mitigations.
HardSystem Design
57 practiced
Design a retention and sampling policy for traces and metrics that allows SREs to debug latency spikes from the last 30 days, while keeping storage and egress costs within a fixed budget. Include tiered retention (hot/cold), sampling/sketching approaches, aggregation rollups, and how to keep high-fidelity data for critical windows.
EasyTechnical
64 practiced
Given this snippet of nginx access.log lines (combined format), write a one-liner awk or Python command to extract URIs with request durations > 2.0 seconds and print the top 5 slowest URIs with counts.
Example log lines:127.0.0.1 - - [22/Nov/2025:12:00:01 +0000] "GET /api/v1/search HTTP/1.1" 200 1234 "-" "ua" 0.215127.0.0.1 - - [22/Nov/2025:12:00:02 +0000] "POST /api/v1/upload HTTP/1.1" 200 5678 "-" "ua" 2.435
MediumTechnical
49 practiced
You have an aggregated sampled CPU flamegraph for a web handler showing top stacks with these percentages:- Handler→ProcessRequest→DBQuery : 30%- Handler→ProcessRequest→Serialize : 25%- Handler→Auth→Decrypt : 20%- Kernel→write : 15%- Others : 10%Which component should you optimize first to reduce tail latency and what concrete steps and validations would you run to verify the improvement?

Unlock Full Question Bank

Get access to hundreds of Performance Debugging and Latency Investigation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.