InterviewStack.io LogoInterviewStack.io

Configuration Management and Operational Rigor Questions

Practices and processes for managing system and network configurations with operational discipline. Topics include version control for configurations, secure configuration backups, automated testing of configuration changes, rollback and recovery mechanisms, detecting and remediating configuration drift, documentation and runbook development, change windows and impact assessment, stakeholder communication for changes, and balancing operational rigor with deployment velocity. Interviewers may probe tooling, automation strategies, validation and testing approaches, and how the candidate ensures repeatability, auditability, and safe change promotion across environments.

EasyTechnical
32 practiced
Provide a simple shell snippet or small Python program that computes a stable sha256 checksum of a text configuration file while ignoring comments and whitespace differences so that semantically equivalent files produce the same checksum. Specify assumptions about comment syntax and edge cases like order sensitivity.
EasyTechnical
49 practiced
Explain configuration drift in the context of long-lived server fleets. Provide concrete examples of how drift can occur, the operational risks it creates, and at least three different detection strategies (including both agent-based and agentless approaches). Finally, describe safe remediation patterns and tradeoffs between automated remediation and human review.
MediumTechnical
40 practiced
Explain the concepts of convergence and eventual consistency in configuration management. Provide a concrete example of a converge-loop (reconciliation) system and describe failure modes where convergence may never be achieved. Suggest mitigations for those failure modes.
HardTechnical
48 practiced
Case study: an organization uses Puppet manifests checked into a central repo but experiences inconsistent environments and frequent emergency manual fixes. Create a phased plan to migrate to an IaC approach with testing and Git-based promotion, including tooling choices, pilot selection, policy changes, and KPIs to measure improvement in consistency and incident reduction.
MediumTechnical
27 practiced
As SRE manager, propose a policy that balances operational rigor and deployment velocity using SLOs and error budgets. Define a tiered approval system for changes based on risk, explain how error-budget burn can throttle changes, and include metrics you would track to ensure the policy is effective.

Unlock Full Question Bank

Get access to hundreds of Configuration Management and Operational Rigor interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.