InterviewStack.io LogoInterviewStack.io

Cloud Cost Optimization and Financial Operations Questions

Covers strategies and organizational practices for minimizing and managing cloud and infrastructure spend while balancing performance, reliability, and business priorities. Candidates should understand cloud cost drivers such as compute, storage, data transfer, and managed services; pricing models including on demand pricing, reserved capacity commitments, savings plans, and interruptible or spot offerings; and engineering techniques that reduce spend such as rightsizing, autoscaling, storage tiering, caching, and workload placement. This topic also includes financial operations practices for continuous cost management and governance: resource tagging and cost allocation, budgeting and forecasting, chargeback and showback models, anomaly detection and alerting, cost reporting and dashboards, and processes to gate changes that affect spend. Interviewees should be able to estimate recurring costs and total cost of ownership, identify and quantify optimization opportunities, weigh trade offs between cost and business objectives, and describe tools and metrics used to monitor and communicate cost to stakeholders.

HardTechnical
47 practiced
You must decide between using a managed autoscaling inference service and building a Kubernetes-based inference platform with custom node pools. Build a cost model that includes: per-request or per-hour managed costs, estimated developer and ops time (FTE cost), expected downtime risk, and projected traffic growth. List decision thresholds (numeric or qualitative) that would favor each option.
MediumTechnical
59 practiced
Write a Python function (pseudo-code acceptable) named estimate_training_cost that takes: gpu_hours, cpu_hours, storage_gb_months, egress_gb, price_lookup (dict with per-unit prices), and an optional discount percentage. Return total estimated cost. Include handling for missing prices and apply discount multiplicatively to compute savings.
EasyTechnical
67 practiced
Describe best practices for resource tagging and cost allocation for an ML org. What required tags would you enforce, how would you handle untagged resources, and how does tagging help with chargeback or showback models?
MediumTechnical
64 practiced
Describe a robust approach to use spot instances for distributed training: include checkpointing frequency, job fragmentation (smaller tasks vs single large job), multi-pool bidding, and how to model expected completion time given historical preemption rates.
EasyTechnical
45 practiced
How would you estimate the cost per 1,000 inferences for a deployed model? List the inputs you need (e.g., instance hourly price, average latency, concurrency) and provide a worked numeric example for a CPU-based service that takes 50ms per inference and runs on a 4-vCPU instance costing $0.20/hour.

Unlock Full Question Bank

Get access to hundreds of Cloud Cost Optimization and Financial Operations interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.