System Resource Management and Monitoring Questions

Monitor and manage operating system and hardware level resources to ensure application performance and stability. Topics include central processing unit utilization and context switching, system load trends, memory usage including heap and stack behavior, paging and swapping effects, disk input output operations and free space, and network bandwidth utilization and packet loss. Know diagnostic tools and commands for observing these signals, recognize patterns of resource contention and exhaustion such as out of memory and high input output wait, and understand mitigation techniques including tuning, resource limits, throttling, caching, capacity planning, and vertical or horizontal scaling.

MediumTechnical

55 practiced

Write an on-call runbook outline for a high disk-latency incident. Include initial triage commands to gather evidence, criteria to determine if the incident is service-impacting, immediate mitigations (throttling, moving workloads, pausing backups), stakeholder communication steps, and post-incident actions to prevent recurrence.

MediumTechnical

54 practiced

Write a script (Bash or Python) that reads iostat -x output periodically and raises an alert if a device has avgqu-sz > 10 and await > 100ms for more than 5 consecutive samples. Describe how the script tracks state over time and provide example alert output format.

EasyTechnical

43 practiced

Explain paging and swapping on Linux. Describe the performance impacts of heavy paging/swapping and how to detect it using commands like free, vmstat, and sar -W. Explain vm.swappiness and when tuning it may improve or worsen latency.

HardTechnical

47 practiced

At the kernel level, what commonly causes spikes in context-switch rates? Describe kernel parameters or user-space strategies to reduce context switching, including thread models, futex usage, lock contention, and potential side effects of scheduler changes.

EasyTechnical

44 practiced

Explain the meaning and interpretation of Linux load average (the 1, 5 and 15 minute values). How does load average relate to CPU cores and runnable processes? Describe how you would interpret a load average of 24 on a 8-core machine and what commands you would run to find contributing processes.

Unlock Full Question Bank

Get access to hundreds of System Resource Management and Monitoring interview questions and detailed answers.

Join thousands of developers preparing for their dream job.