InterviewStack.io LogoInterviewStack.io

Systems Engineering Coding and Problem Solving Questions

Practical coding and algorithmic skills applied to systems and infrastructure tasks. Candidates should demonstrate the ability to write syntactically correct and maintainable code or scripts to automate operations, parse logs, collect metrics, implement health checks, and perform diagnostics; choose appropriate simple data structures and algorithms; reason about time and space complexity at a practical level; apply defensive programming and error handling; debug effectively and write tests to validate behavior; and solve timed technical problems by prioritizing core functionality, correctness, and maintainability. Interviewers commonly use small coding exercises, scripting tasks, or live problems that emphasize operational automation and system oriented problem solving.

EasyTechnical
66 practiced
Write a Python script that reads a potentially large log file line-by-line (streaming, do not load entire file into memory) and counts occurrences of distinct error codes that match the pattern ERR followed by digits (example: ERR1234). The script should handle missing or unreadable files gracefully, accept the file path as an argument, and print the top 5 error codes and their counts in descending order.
EasyTechnical
61 practiced
You observe sustained high CPU on a production host. Describe a step-by-step debugging plan to identify the root cause without causing more disruption. Include commands or tools you would run, how to capture evidence for offline analysis, and how to act if the host is critical to customer traffic.
HardTechnical
53 practiced
You are on-call and a production service suffers a cascading failure causing high latency and increased error rates. Draft a concise runbook (steps an on-call engineer should follow) and propose an automated mitigation script that implements a circuit breaker and traffic shifting to a healthy region. Include safety checks and a rollback path in your automation plan.
MediumTechnical
70 practiced
Explain the practical differences between concurrency models in Python: threading, multiprocessing, and asyncio. For each model, state when it is appropriate (network-bound, CPU-bound, I/O-bound), how the Global Interpreter Lock (GIL) affects behavior, and how you would pick one for a systems scripting task that polls thousands of sockets.
HardSystem Design
70 practiced
Design a scalable concurrent health-checker service that runs health probes across thousands of hosts, respects a global rate limit of R checks/sec, supports exponential backoff for flapping hosts, and can be horizontally sharded across workers. Sketch the architecture, data flow, failure modes, and a worker-level implementation strategy (language of choice).

Unlock Full Question Bank

Get access to hundreds of Systems Engineering Coding and Problem Solving interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.