InterviewStack.io LogoInterviewStack.io

Cloud Platform Experience Questions

Personal account of hands on experience using public cloud providers and the concrete results delivered. Candidates should describe specific services and patterns they used for compute, storage, networking, managed databases, serverless and eventing, and explain their role in architecture decisions, deployments, automation and infrastructure as code practices, continuous integration and continuous delivery pipelines, container orchestration, scaling and performance tuning, monitoring and incident response, and cost management. Interviewees should quantify outcomes when possible with metrics such as latency reduction, cost savings, availability improvements or deployment frequency and note any formal training or certifications. This topic evaluates depth of practical experience, ownership, and the ability to operate and improve cloud systems in production.

MediumTechnical
0 practiced
Describe how you would implement a cloud-based feature store that supports both batch training and low-latency online inference. Include storage choices for offline and online stores, ingestion pipelines, serving API design, consistency/freshness model, partitioning/TTL strategy, and strategies to perform backfills and rollback incorrect features.
MediumTechnical
0 practiced
Describe an instrumentation strategy to measure client-perceived latency and throughput for a model endpoint across multiple regions. What should you measure at the client, gateway/load-balancer, application, and model levels? How would you sample traces, aggregate metrics for SLO reporting, and correlate logs, traces, and metrics during debugging?
HardSystem Design
0 practiced
Design a disaster recovery strategy for data pipelines and model serving across multiple regions to achieve an RTO of 30 minutes and an RPO of 1 hour. Cover backups, cross-region replication, automated failover, DNS and TLS considerations, data consistency, testing/playbook for DR drills, and how to limit production impact during DR testing.
EasyTechnical
0 practiced
How do you choose compute instance types for model training and inference in the cloud? Discuss trade-offs between CPU and GPU, GPU families (for example NVIDIA T4, V100, A100), memory, storage throughput, and networking. Explain a concrete example where you benchmarked training throughput and computed cost per epoch to choose the right instance type.
EasyTechnical
0 practiced
What is Infrastructure as Code (IaC) and why is it important for reproducible ML deployments? As a data scientist, describe how IaC helped your team (for example Terraform modules or CloudFormation stacks), and briefly compare Terraform to a provider-native template language. Give one concrete IaC pattern you used such as modules, remote state, or state locking.

Unlock Full Question Bank

Get access to hundreds of Cloud Platform Experience interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.