InterviewStack.io LogoInterviewStack.io

Performance Fundamentals and Troubleshooting Questions

Core skills for identifying, diagnosing, and resolving general performance problems across applications and systems. Topics include establishing baselines and metrics, using monitoring and profiling tools to determine whether issues are CPU bound, memory bound, input output bound, or network bound, and applying systematic troubleshooting workflows. Candidates should be able to prioritize fixes, recommend temporary mitigations and long term solutions, and explain when to escalate to specialists. This canonical topic covers general performance awareness, common diagnostic tools, and basic remediation approaches for slow systems and resource exhaustion.

HardTechnical
74 practiced
A service uses three cache layers: in-process memory, local disk cache, and a remote object store. Occasionally requests spike from low ms to hundreds of ms. Design experiments and telemetry to determine which cache layer is responsible for spikes and how to attribute each request's latency to a specific layer. Describe instrumentation (timing spans, counters) and sampling strategy.
EasyTechnical
73 practiced
What is a performance baseline and why is it important? Describe a pragmatic process for establishing a baseline for a newly deployed service: list the metrics you would capture (at least 6), the time window and aggregation strategies you would choose, and how you would document and version the baseline for future comparison.
MediumTechnical
59 practiced
You are given the following sampled outputs taken at the same time:
iostat -x 1 2 (relevant line):Device r/s w/s rkB/s wkB/s avgrq-sz await svctm %utilsda 50.0 200.0 1024.0 8192.0 45.0 25.0 2.50 62.5
vmstat 1 2:procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 0 50000 20000 100000 0 0 200 150 1000 2000 40 10 10 40 0
Based on these snippets, which resource is currently the most likely bottleneck and why? What immediate action would you take and what additional metrics or commands would you collect to confirm your diagnosis?
HardTechnical
59 practiced
You suspect subtle lock contention in a high-performance C++ service that causes p99 latency spikes at high concurrency. Propose a detailed methodology using perf, lock profiling tools, and code instrumentation to pinpoint the contention. Explain how you'd validate the root cause and evaluate fixes such as lock sharding, lock-free queues, or redesigning hot paths.
HardTechnical
57 practiced
With constrained CPU and memory budgets, design a caching strategy and eviction policy for an in-memory key-value store to maximize hit rate for skewed, changing access patterns. Discuss algorithms (LRU, LFU, TinyLFU, ARC), admission filters, approximate counting, and how you would measure and adapt the policy in production.

Unlock Full Question Bank

Get access to hundreds of Performance Fundamentals and Troubleshooting interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.