Personal account of hands on experience using public cloud providers and the concrete results delivered. Candidates should describe specific services and patterns they used for compute, storage, networking, managed databases, serverless and eventing, and explain their role in architecture decisions, deployments, automation and infrastructure as code practices, continuous integration and continuous delivery pipelines, container orchestration, scaling and performance tuning, monitoring and incident response, and cost management. Interviewees should quantify outcomes when possible with metrics such as latency reduction, cost savings, availability improvements or deployment frequency and note any formal training or certifications. This topic evaluates depth of practical experience, ownership, and the ability to operate and improve cloud systems in production.
HardTechnical
42 practiced
Design an autoscaling strategy for a cluster hosting a mix of latency-sensitive human traffic and large batch processing jobs. Describe node pool composition (instance types, taints), pod-level autoscaling, queue-based autoscaling for batches, use of spot/preemptible instances, scheduling priorities/QoS, and explain cost versus performance trade-offs and preemption handling.
MediumTechnical
33 practiced
An API service is experiencing intermittent CPU spikes that increase P95 latency. Provide a step-by-step diagnostic and remediation plan: what metrics, traces and logs you would inspect, short-term mitigations (throttling, autoscaling), long-term fixes (code hotspots, DB tuning, caching), and how you'd validate improvements under load and prevent regressions in CI/CD.
HardSystem Design
38 practiced
You operate Kubernetes clusters across multiple regions. Design an approach for multi-cluster service discovery and failover using either service mesh federation or cluster federation. Include control-plane topology, east-west networking, mTLS/identity across clusters, latency implications, CI/CD for mesh/config rollout, and how to respect data locality and regulatory constraints.
MediumTechnical
41 practiced
Compare managed relational and NoSQL offerings across cloud providers for a customer-profile service with high read, low write traffic and a requirement for global read replicas. Evaluate options such as Aurora Global DB, Cloud Spanner, DynamoDB Global Tables, and Cosmos DB. Discuss consistency models, operational complexity, expected costs, and recommend when to choose each option.
MediumSystem Design
39 practiced
You need to migrate a monolithic web application and a 5 TB Postgres database (2M monthly active users) from on-prem to AWS with minimal planned downtime. Outline a step-by-step migration plan covering discovery, schema and data migration tools (e.g., DMS, logical replication), cutover strategy (blue/green, shadow/read-only, rolling), testing and validation, rollback processes, and the metrics you would use to declare success.
Unlock Full Question Bank
Get access to hundreds of Cloud Platform Experience interview questions and detailed answers.