InterviewStack.io LogoInterviewStack.io

System Resource Management and Monitoring Questions

Monitor and manage operating system and hardware level resources to ensure application performance and stability. Topics include central processing unit utilization and context switching, system load trends, memory usage including heap and stack behavior, paging and swapping effects, disk input output operations and free space, and network bandwidth utilization and packet loss. Know diagnostic tools and commands for observing these signals, recognize patterns of resource contention and exhaustion such as out of memory and high input output wait, and understand mitigation techniques including tuning, resource limits, throttling, caching, capacity planning, and vertical or horizontal scaling.

MediumTechnical
55 practiced
Write an on-call runbook outline for a high disk-latency incident. Include initial triage commands to gather evidence, criteria to determine if the incident is service-impacting, immediate mitigations (throttling, moving workloads, pausing backups), stakeholder communication steps, and post-incident actions to prevent recurrence.
MediumTechnical
54 practiced
Write a script (Bash or Python) that reads iostat -x output periodically and raises an alert if a device has avgqu-sz > 10 and await > 100ms for more than 5 consecutive samples. Describe how the script tracks state over time and provide example alert output format.
EasyTechnical
43 practiced
Explain paging and swapping on Linux. Describe the performance impacts of heavy paging/swapping and how to detect it using commands like free, vmstat, and sar -W. Explain vm.swappiness and when tuning it may improve or worsen latency.
HardTechnical
47 practiced
At the kernel level, what commonly causes spikes in context-switch rates? Describe kernel parameters or user-space strategies to reduce context switching, including thread models, futex usage, lock contention, and potential side effects of scheduler changes.
EasyTechnical
44 practiced
Explain the meaning and interpretation of Linux load average (the 1, 5 and 15 minute values). How does load average relate to CPU cores and runnable processes? Describe how you would interpret a load average of 24 on a 8-core machine and what commands you would run to find contributing processes.

Unlock Full Question Bank

Get access to hundreds of System Resource Management and Monitoring interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.