InterviewStack.io LogoInterviewStack.io

Python Scripting for Infrastructure Automation Questions

Applied Python skills for building reusable, production grade automation for infrastructure and operations. Topics include designing modular automation code and libraries, using relevant third party libraries for systems administration and remote management, invoking and controlling subprocesses, interacting with application programming interfaces and cloud platform endpoints, robust error handling and structured logging, automated testing of scripts and modules, packaging and distributing tools for reuse, secure credential management, integration with configuration management and orchestration tooling, and designing multi step workflows and idempotent operations. Candidates should demonstrate experience with writing maintainable automation, reasoning about failure modes, and selecting appropriate abstractions and libraries for operational tasks.

HardTechnical
89 practiced
Design the core algorithm and provide pseudocode for a Python file sync tool that reconciles a large local directory tree to remote object storage (e.g., S3). Requirements: avoid re-uploading unchanged files using checksums, support chunked multipart uploads for large files, resume interrupted uploads, run with concurrency limits, and be resilient to partial failures. Explain data structures for tracking state, how to detect renames vs new files, and how to implement resume tokens.
HardSystem Design
63 practiced
Design a scalable Python automation platform ('job-runner') that schedules and executes scheduled and event-driven jobs across many workers. Requirements: exactly-once or at-least-once semantics as appropriate, distributed locking or leader election, persistent job checkpoints, retries with backoff, role-based permissions for job definitions, multi-tenant isolation, metrics and tracing, and the ability to scale to 10k job executions per hour. Sketch the architecture, component responsibilities, storage choices (RDB, NoSQL, Redis), locking approach, and how to handle partial failures and recovery.
EasyTechnical
59 practiced
Implement a Python function 'http_get_with_retry(url: str, max_attempts: int = 5, timeout: int = 10)' that performs an HTTP GET using the 'requests' library and retries on transient network errors and 5xx responses. Use exponential backoff with jitter between retries, and ensure you do not retry on 4xx client errors that are permanent. If 'tenacity' is available you may outline how to use it; otherwise implement the retry loop yourself. Keep the function idempotent and allow caller to configure max_attempts and timeout.
EasyTechnical
70 practiced
What execution metadata and metrics should an automation script emit to help SREs monitor its health and performance? Provide concrete examples such as run_count, success_rate, duration_ms, retries, last_error, and queue_wait_ms. Show how to push metrics to Prometheus Pushgateway or StatsD from Python and describe useful alerting rules based on those metrics.
MediumTechnical
53 practiced
Implement a Python decorator '@instrument' that logs function entry and exit with JSON fields: function, args_hash, start_ts, end_ts, duration_ms, status, and exception info on failure. Use the standard logging module and design the decorator so overhead is minimal when instrumentation is disabled. Show code for the decorator and an example usage.

Unlock Full Question Bank

Get access to hundreds of Python Scripting for Infrastructure Automation interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.