InterviewStack.io LogoInterviewStack.io

Error Handling and Defensive Programming Questions

Covers designing and implementing defensive, fault tolerant code and system behaviors to prevent and mitigate production failures. Topics include input validation and sanitization, null and missing data handling, overflow and boundary protections, exception handling and propagation patterns, clear error reporting and structured logging for observability, graceful degradation and fallback strategies, retry and backoff policies and idempotency for safe retries. Also address concurrency and synchronization concerns, resource and memory management to avoid exhaustion, security related input checks, and how to document and escalate residual risks. Candidates should discuss pragmatic trade offs between robustness and complexity, show concrete defensive checks and assertions, and describe test strategies for error paths including unit tests and integration tests and how monitoring and operational responses tie into robustness.

HardSystem Design
0 practiced
Design a distributed streaming inference system that handles per-record retries with at-most-once, at-least-once, and exactly-once semantics as selectable modes. The system must also handle backpressure, bounded memory, and service restarts. Describe components, message delivery guarantees, storage choices, and how to implement safe retries without duplicating side-effects.
HardSystem Design
0 practiced
Design a safe model deployment and rollback strategy for frequent model updates in production. Include canary deployments, health and correctness checks, automatic rollback triggers, data and label shadowing for verification, and procedures to handle model-related incidents during business-critical hours.
MediumTechnical
0 practiced
Design a small runbook for on-call engineers to follow when a model-serving cluster begins returning frequent 500 errors or timing out. The runbook should include immediate mitigation steps, diagnosis commands, metrics to inspect, safe rollbacks, and communication templates for affected stakeholders.
MediumTechnical
0 practiced
Technical-domain-specific: Explain how garbage collection and reference cycles in Python can cause memory leaks in long-running ML serving processes that hold large numpy or tensor buffers. Describe detection techniques and defensive coding patterns to avoid or mitigate such leaks.
MediumTechnical
0 practiced
A production model serving cluster is seeing increased tail latency due to resource contention. Describe concurrency and timeout controls you would add at the serving layer, including worker pool sizing, request timeouts, queue limits, and backpressure mechanisms to prevent cascading failures.

Unlock Full Question Bank

Get access to hundreds of Error Handling and Defensive Programming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.