Loss Functions, Behaviors & Selection Questions
Loss function design, evaluation, and selection in machine learning. Includes common loss functions (MSE, cross-entropy, hinge, focal loss), how loss properties affect optimization and gradient flow, issues like class imbalance and label noise, calibration, and practical guidance for choosing the most appropriate loss for a given task and model.
HardTechnical
79 practiced
Design a loss-level approach to enforce equalized odds across demographic groups in a classifier. Describe converting fairness constraints into Lagrangian-penalty terms, discuss optimization challenges from non-convexity, and propose monitoring, rollback, and stakeholder communication strategies for production deployments where fairness and accuracy trade-offs must be justified.
EasyTechnical
81 practiced
Explain categorical cross-entropy loss for multi-class classification. Show the mathematical formula for softmax followed by cross-entropy (negative log-likelihood), explain its probabilistic interpretation, and mention numerical-stability tricks such as log-sum-exp. Provide a short numeric example: logits = [2.0, 1.0, -1.0], target class 0, compute softmax probabilities and the loss value.
MediumTechnical
77 practiced
Derive the gradient of softmax plus cross-entropy with respect to logits and explain its behavior for very confident predictions (extreme logits). Discuss numerical issues that arise and practical mitigation techniques such as label smoothing, gradient clipping, log-sum-exp stabilization, and temperature scaling.
HardTechnical
82 practiced
You must optimize multiple conflicting objectives such as accuracy, latency, and fairness. Describe gradient-based approaches to find Pareto-optimal solutions, including weighted-sum, Pareto MTL, PCGrad, and GradNorm. Explain how each method handles conflicting gradients, implementation considerations at scale, and how to select a deployment point on the resulting Pareto frontier.
HardSystem Design
144 practiced
In large-scale distributed training with mixed precision and gradient accumulation, how would you implement per-example loss weighting efficiently so that weights are applied before gradient reduction? Outline a scalable approach that respects GPU memory constraints and ensures correctness under gradient all-reduce semantics.
Unlock Full Question Bank
Get access to hundreds of Loss Functions, Behaviors & Selection interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.