InterviewStack.io LogoInterviewStack.io

Operating System Internals and Administration Questions

Fundamental and advanced operating system concepts that underlie system administration across platforms. Topics include process and thread management, scheduling and concurrency, memory management and swapping, virtual memory and page replacement, input output and disk performance, file system architecture and semantics, system call interfaces, kernel parameters and tuning, authentication and permission models, boot and initialization sequences, monitoring and system performance analysis, and general techniques for debugging and diagnosing systemic operating system issues. Candidates should be able to explain not only how to perform administrative tasks but why the underlying mechanisms behave as they do and how design choices affect performance and reliability.

MediumSystem Design
60 practiced
Define SLOs and error budgets for OS-level metrics relevant to services, such as process crash rate, disk I/O p99 latency, and kernel OOM incidents. Explain how you would pick thresholds, map them to user impact, set alert burn rates, and use the error budget to prioritize mitigations vs feature work.
HardTechnical
78 practiced
Several machines in a datacenter experience kernel panics following a maintenance window. Describe the incident response: how to collect crash dumps (kdump/crashkernel), configure a crash dump host, analyze vmcore with the 'crash' utility or gdb to extract panic stack traces, correlate with package/firmware changes, and determine mitigation/rollback steps.
HardTechnical
63 practiced
Sketch an eBPF or bpftrace program that counts syscalls per PID and records syscall latencies (histogram). Describe probe points you would attach to, how you would aggregate to avoid high-cardinality explosion, how to export metrics to Prometheus, and techniques to limit overhead in production.
MediumTechnical
75 practiced
Write a Python script (pseudocode acceptable) that monitors a given PID's RSS over time. The script should sample every 30 seconds, record RSS, detect if RSS grows by more than 20% over a 10-minute window, handle PID reuse, and print an alert with the process command line and owner. Describe limitations and how you'd run this at scale.
EasyTechnical
56 practiced
Explain RAM vs swap on Linux in detail. When does swapping occur, what are soft vs hard page faults, how does the page cache interact with application memory, and what performance implications does swapping have for latency-sensitive services? Mention vm.overcommit and vm.swappiness.

Unlock Full Question Bank

Get access to hundreds of Operating System Internals and Administration interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.