InterviewStack.io LogoInterviewStack.io
☁️

Cloud & Infrastructure Topics

Cloud platform services, infrastructure architecture, Infrastructure as Code, environment provisioning, and infrastructure operations. Covers cloud service selection, infrastructure provisioning patterns, container orchestration (Kubernetes), multi-cloud and hybrid architectures, infrastructure cost optimization, and cloud platform operations. For CI/CD pipeline and deployment automation, see DevOps & Release Engineering. For cloud security implementation, see Security Engineering & Operations. For data infrastructure design, see Data Engineering & Analytics Infrastructure.

Your SRE Background and Experience

Articulate your hands-on experience with systems administration, monitoring tools, automation scripts, and any incident response involvement. Be specific about technologies (e.g., Prometheus, Grafana, Kubernetes, Docker, Terraform) and concrete examples of what you've built or fixed.

40 questions

Technical Vision and Infrastructure Roadmap

This topic assesses a candidate's ability to define a multi year technical vision for infrastructure, platform, and systems and to translate that vision into a practical execution roadmap. Core skills include evaluating technology choices and architecture evolution, planning migration and modernization paths, anticipating scalability and capacity needs, and balancing cost performance with resilience and operational reliability. Candidates should demonstrate approaches to managing technical debt, sequencing investments across quarters and releases, estimating resources and timelines, establishing measurable infrastructure goals and key performance indicators, and implementing governance and standards. Discussion may also cover reliability and observability, security and compliance considerations, trade offs between short term stability and long term rearchitecture, prioritization to enable business outcomes, and communicating technical trade offs to both technical and non technical stakeholders.

40 questions

Transport Layer Protocols

Comprehensive understanding of transport layer protocols, primarily Transmission Control Protocol (TCP) and User Datagram Protocol (UDP), and related protocols used for diagnostics such as Internet Control Message Protocol (ICMP). Candidates should be able to explain TCP as a connection oriented, reliable, ordered, and flow controlled protocol including the three way handshake for connection establishment, the four step connection teardown, retransmission and timeout behavior, and high level congestion control and flow control mechanisms. Describe TCP header structure and key fields used for reliability and ordering. Explain UDP as a connectionless, best effort, lower latency protocol, its datagram model, simple header structure, and trade offs for reliability and ordering. Give real world use cases and justify protocol choice, for example reliable file transfer and web traffic versus low latency streaming, real time voice, and many DNS queries. Discuss port numbers and common service ports such as HTTP port 80, HTTPS port 443, DNS port 53, SSH port 22, and SMTP port 25, and how sockets and ports map to endpoints. Cover practical topics such as when UDP may fall back to TCP, how fragmentation and packet loss affect each protocol, and the role of ICMP for network diagnostics and error reporting.

40 questions

Multi Region and Multi Cloud Resilience

Designing systems that work across multiple geographic regions or cloud providers. This addresses the highest reliability requirements and provides protection against provider-level failures. At senior level, understand data replication across regions, latency implications, consistency trade-offs, and cost of multi-region deployments. Design routing policies that direct traffic to healthy regions. Address compliance requirements that may mandate geographic distribution.

44 questions

Large Scale Infrastructure Challenges

Awareness of engineering and operational challenges at massive scale including global network optimization, multi region failover and redundancy, integration of cloud and on premise systems, security and compliance at scale, performance and latency for a global user base, cost optimization across large fleets, and maintaining reliability without exponential operational complexity. Candidates should demonstrate thinking about architecture patterns, trade offs, monitoring and incident response at scale, and strategies for evolving platform capabilities as load and feature sets grow.

40 questions

Capacity Planning and Resource Optimization

Covers forecasting, provisioning, and operating compute, memory, storage, and network resources efficiently to meet demand and service level objectives. Key skills include monitoring resource utilization metrics such as central processing unit usage, memory consumption, storage input and output and network throughput; analyzing historical trends and workload patterns to predict future demand; and planning capacity additions, safety margins, and buffer sizing. Candidates should understand vertical versus horizontal scaling, autoscaling policy design and cooldowns, right sizing instances or containers, workload placement and isolation, load balancing algorithms, and use of spot or preemptible capacity for interruptible workloads. Practical topics include storage planning and archival strategies, database memory tuning and buffer sizing, batching and off peak processing, model compression and inference optimization for machine learning workloads, alerts and dashboards, stress and validation testing of planned changes, and methods to measure that capacity decisions meet both performance and cost objectives.

40 questions

Cost Optimization at Scale

Addresses cost conscious design and operational practices for systems operating at large scale and high volume. Candidates should discuss measuring and improving unit economics such as cost per request or cost per customer, multi tier storage strategies and lifecycle management, caching, batching and request consolidation to reduce resource use, data and model compression, optimizing network and input output patterns, and minimizing egress and transfer charges. Senior discussions include product level trade offs, prioritization of cost reductions versus feature velocity, instrumentation and observability for ongoing cost measurement, automation and runbook approaches to enforce cost controls, and organizational practices to continuously identify, quantify, and implement savings without compromising critical service level objectives. The topic emphasizes measurement, benchmarking, risk assessment, and communicating expected savings and operational impacts to stakeholders.

40 questions

Networking Fundamentals and Troubleshooting

Comprehensive coverage of core computer networking principles and the practical diagnostic and operational skills required to design, operate, and troubleshoot production systems. Fundamental concepts include the Open Systems Interconnection model layers, the Transmission Control Protocol and the Internet Protocol stack, the User Datagram Protocol, socket and port semantics, address notation and subnetting, Network Address Translation, Dynamic Host Configuration Protocol, and the Domain Name System resolution process. Infrastructure and architectural topics include switching and virtual local area networks, routing concepts and routing table behavior including Border Gateway Protocol basics, load balancing strategies and failure modes, firewall and access control, virtual private network technologies, and container and service network communication patterns. Diagnostic and tooling skills cover connectivity testing and path analysis, process and socket inspection, packet capture and analysis, and common command line tools and utilities used for network investigation. Performance and reliability topics include latency, bandwidth and throughput, packet loss, congestion and congestion control, connection pooling, timeout and retry strategies, and approaches to optimization. Observability, monitoring, and security practices include collecting and interpreting network metrics, logs, and traces, using packet capture tools for root cause analysis, and understanding how network issues surface in distributed applications. At senior levels expect discussion of network performance tuning, capacity planning, load balancer behavior at scale, and design decisions that affect system reliability and security.

40 questions

Scalability and Systems Resource Management

Design and operational practices for managing compute and platform resources as systems scale. Covers autoscaling, resource pooling, orchestration, cost trade offs between always on versus on demand provisioning, and architectural choices that affect resource utilization and performance. Candidates should be prepared to discuss capacity planning for infrastructure, metrics and alerts for autoscaling, and cost versus performance decisions for high availability systems.

40 questions
Page 1/7