InterviewStack.io LogoInterviewStack.io

Error Handling and Defensive Programming Questions

Covers designing and implementing defensive, fault tolerant code and system behaviors to prevent and mitigate production failures. Topics include input validation and sanitization, null and missing data handling, overflow and boundary protections, exception handling and propagation patterns, clear error reporting and structured logging for observability, graceful degradation and fallback strategies, retry and backoff policies and idempotency for safe retries. Also address concurrency and synchronization concerns, resource and memory management to avoid exhaustion, security related input checks, and how to document and escalate residual risks. Candidates should discuss pragmatic trade offs between robustness and complexity, show concrete defensive checks and assertions, and describe test strategies for error paths including unit tests and integration tests and how monitoring and operational responses tie into robustness.

HardTechnical
0 practiced
As an AI engineering lead, draft the skeleton of a policy to document residual risks after defensive measures are applied. The policy should state how to record residual risks, acceptance criteria, monitoring requirements, who must approve accepted risks, and the escalation path to security, legal, and senior leadership. Include timelines and review cadence.
HardTechnical
0 practiced
Design and implement (pseudocode is fine) a high-throughput, low-latency mechanism to maintain per-model inference counters across threads without heavy locking. Consider techniques like per-thread/sharded counters, atomic primitives, and periodic aggregation. Provide trade-off analysis regarding memory overhead, eventual accuracy, and how to handle counter overflow.
MediumTechnical
0 practiced
Compare eager validation (validate all data upfront during ingestion) versus lazy validation (validate on access or when data is used) for a large ML dataset ingestion pipeline. Discuss performance implications, error localization, cost of reprocessing, operational complexity, and scenarios where one approach is preferable.
EasyTechnical
0 practiced
Provide a short Python example that uses context managers (with-statements) to manage external resources (files, DB sessions, GPU contexts) during preprocessing. Explain how context managers and finally blocks prevent resource leaks and what you'd do to ensure resources are cleaned up even when exceptions occur.
MediumTechnical
0 practiced
List practical strategies to avoid GPU out-of-memory (OOM) during training and inference in deep learning: gradient checkpointing, mixed-precision, model sharding, dynamic batch sizing, activation offloading, and monitoring. For each strategy, explain typical failure modes, implementation complexity, and which metrics you'd monitor to detect memory pressure.

Unlock Full Question Bank

Get access to hundreds of Error Handling and Defensive Programming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.