Error Handling and Code Quality Questions

Focuses on writing production quality code and scripts that are defensive, maintainable, and fail gracefully. Covers anticipating and handling failures such as exceptions, missing files, network errors, and process exit codes; using language specific constructs for error control for example try except blocks in Python or set minus e patterns in shell scripts; validating inputs; producing clear error messages and logs; and avoiding common pitfalls that lead to silent failures. Also includes code quality best practices such as readable naming and code structure, using standard libraries instead of reinventing functionality, writing testable code and unit tests, and designing for maintainability and observability.

HardSystem Design

0 practiced

Design an inference service that degrades gracefully: when the primary heavy model fails, the service should fallback to a lightweight model or cached responses. Define components, failure detection, routing logic, and consistency concerns (e.g., different model outputs). Include how you'd test the fallback behavior in staging and production.

MediumTechnical

0 practiced

You have an existing training loop that currently swallows all exceptions and just prints 'error' before continuing. Refactor the pattern to: (a) properly log the error with context, (b) checkpoint model state before risky operations, (c) decide whether to abort training or skip a single batch, and (d) ensure the process exits with non-zero code on unrecoverable errors. Describe the concrete changes you'd make and why.

HardTechnical

0 practiced

Discuss strategies to make long-running batch training jobs crash-consistent and idempotent: how to checkpoint intermediate state, ensure atomic artifact writes, and avoid duplicate downstream side-effects (e.g., metrics or DB writes) when jobs are retried.

HardSystem Design

0 practiced

Design a canary rollout strategy for a new model version that includes automated failure detection and rollback. Specify the health metrics you monitor, thresholds for rollback, experiment groups (percentage traffic), and how you'd handle false positives during transient infra problems.

MediumTechnical

0 practiced

Explain the trade-offs between failing fast and attempting to continue execution (best-effort) in ML pipelines. Provide examples of both approaches in the contexts of (a) serving predictions, and (b) nightly model retraining. When is each strategy preferable?

Unlock Full Question Bank

Get access to hundreds of Error Handling and Code Quality interview questions and detailed answers.

Join thousands of developers preparing for their dream job.