InterviewStack.io LogoInterviewStack.io

Cloud Architecture Fundamentals Questions

Fundamental concepts and design patterns for cloud based systems and services. Topics include core service categories such as compute, storage, networking and databases, virtual machines and containers, serverless computing, managed services, and infrastructure as code. Understand deployment and service models including infrastructure as a service, platform as a service, and software as a service. Evaluate architectural patterns including monolithic, microservices, and serverless approaches, and how they influence scalability, availability, reliability, performance, security, and cost. For more senior roles include distributed systems concepts, consistency and partitioning models, trade off analysis, fault isolation, observability and operational practices in cloud native design.

MediumTechnical
0 practiced
Describe how you would architect a reproducible and auditable ML training pipeline using cloud-native services. Include choices for data versioning, code versioning, compute provisioning, experiment tracking, and where to store model artifacts. Explain trade-offs between managed services (e.g., SageMaker/Vertex AI) and DIY solutions.
MediumTechnical
0 practiced
You need to ensure your training jobs can recover from preemption when using spot or preemptible instances. Describe an architecture to support checkpointing, state recovery, and job resubmission with minimal wasted work. What storage choices and checkpoint frequency considerations matter for large models?
HardSystem Design
0 practiced
Design an observability plan for ML systems in production. Specify which metrics, logs, traces, and business KPIs you would collect for: model serving, feature pipelines, and training jobs. Explain how you'd detect data drift, model performance degradation, and infrastructure resource saturation.
HardSystem Design
0 practiced
You're asked to design a distributed training cluster for training a Transformer on a 10TB dataset using 128 GPUs across multiple nodes. Explain the differences between data-parallel and model-parallel approaches, recommend one or a hybrid approach, and outline networking and storage requirements to avoid bottlenecks.
MediumSystem Design
0 practiced
Create a high-level network design for a secure VPC hosting ML infrastructure that includes: private subnets for training, public subnets for inference gateways, a bastion host for admin access, and access to managed storage. Indicate how you would use security groups, NACLs, and VPC endpoints to limit exposure and minimize data egress.

Unlock Full Question Bank

Get access to hundreds of Cloud Architecture Fundamentals interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.