InterviewStack.io LogoInterviewStack.io

Error Handling and Defensive Programming Questions

Covers designing and implementing defensive, fault tolerant code and system behaviors to prevent and mitigate production failures. Topics include input validation and sanitization, null and missing data handling, overflow and boundary protections, exception handling and propagation patterns, clear error reporting and structured logging for observability, graceful degradation and fallback strategies, retry and backoff policies and idempotency for safe retries. Also address concurrency and synchronization concerns, resource and memory management to avoid exhaustion, security related input checks, and how to document and escalate residual risks. Candidates should discuss pragmatic trade offs between robustness and complexity, show concrete defensive checks and assertions, and describe test strategies for error paths including unit tests and integration tests and how monitoring and operational responses tie into robustness.

MediumTechnical
0 practiced
List practical strategies to avoid GPU out-of-memory (OOM) during training and inference in deep learning: gradient checkpointing, mixed-precision, model sharding, dynamic batch sizing, activation offloading, and monitoring. For each strategy, explain typical failure modes, implementation complexity, and which metrics you'd monitor to detect memory pressure.
HardSystem Design
0 practiced
Design a fault-tolerant architecture for serving large generative models at 100k QPS that manages 1PB of stored model artifacts. Include components such as model registry, shard placement, load balancing, prediction cache, circuit-breakers, bulkheads, autoscaling, and cold-start strategies. Explain how you would handle partial outages so the service maintains safety and SLOs while avoiding unsafe outputs.
HardTechnical
0 practiced
Design a fault-injection and chaos-testing plan for a model serving cluster to validate recovery from node failures, network partitions, storage errors, and corrupted model artifacts. Include an automated test harness, safety caps (blast-radius limits), canary strategies, criteria for success/failure, and how to safely roll out chaos tests toward production.
HardTechnical
0 practiced
Design and implement (pseudocode is fine) a high-throughput, low-latency mechanism to maintain per-model inference counters across threads without heavy locking. Consider techniques like per-thread/sharded counters, atomic primitives, and periodic aggregation. Provide trade-off analysis regarding memory overhead, eventual accuracy, and how to handle counter overflow.
MediumTechnical
0 practiced
Design a comprehensive test plan for error paths of a data preprocessing pipeline that feeds a production model. Include unit tests, integration tests, fuzz tests, contract/schema tests, and test cases for corrupted files, schema drift, partial records, and transient I/O failures. Explain how to automate these tests while keeping CI runtime reasonable.

Unlock Full Question Bank

Get access to hundreds of Error Handling and Defensive Programming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.