Apple Site Reliability Engineer (Mid-Level) Interview Preparation Guide 2026
Apple's SRE interview process for mid-level candidates consists of a structured seven-round evaluation combining technical depth, system design capabilities, and cultural alignment. The process includes initial recruiter screening, two technical phone screens covering Linux systems and networking, and a full-day virtual onsite with four rounds assessing systems internals, SRE practices and observability, coding and automation, and system design. Behavioral and Apple values assessment are integrated throughout the interview process. Based on recent interview data, the total timeline typically spans 4-8 weeks from application to offer.
Interview Rounds
Recruiter Screening
What to Expect
This combined round includes the recruiter's initial contact and follow-up screening. The recruiter verifies your background, confirms interest in the SRE role, and assesses basic alignment with position requirements. Discussions cover your experience with system reliability, operations, relevant technical skills, and why you're interested in Apple specifically. This round also serves as a logistics coordination point: confirming timeline, discussing team structure, clarifying role expectations, and scheduling subsequent phone screens. Upon successful completion, the recruiter provides interview guidelines and technical phone screen logistics.
Tips & Advice
Be enthusiastic and specific about why this SRE role at Apple interests you. Prepare a clear narrative about your background: specific systems you've worked on, operational challenges you've solved, and concrete impact (e.g., 'I reduced incident response time by 40% through automation'). Have 2-3 detailed project examples ready. Ask informed questions showing you've researched Apple: mention specific products, reliability standards, or publicly known infrastructure challenges. Research what's publicly known about Apple's infrastructure and reliability requirements. Be professional but conversational. Confirm scheduling details and clarify timezone requirements. Show genuine enthusiasm for reliability engineering as a discipline.
Focus Topics
Apple's Reliability Standards & Products
Demonstrate understanding of why reliability is paramount at Apple: device ecosystem across hardware and software, user expectations, brand reputation. Show you've thought about how you'd contribute to Apple's high reliability standards. Mention any personal experience with Apple products or services.
Practice Interview
Study Questions
Specific Projects & Measurable Impact
Prepare 2-3 detailed stories of projects you owned or significantly contributed to. For each: What was the initial state? What was the problem? What did you do? What was the measurable outcome? How did you mentor others? Why are you proud of this work?
Practice Interview
Study Questions
Career Narrative & SRE Background
Clearly articulate your progression through SRE or operations roles with concrete examples: types of systems managed, scale handled (users, requests/second, data volume), and measurable impact. Connect your experience to Apple's reliability requirements. Explain what drew you to SRE and why you want to work at Apple specifically.
Practice Interview
Study Questions
Technical Skills & Tech Stack Proficiency
Highlight core SRE competencies: Linux/systems administration depth, monitoring and observability expertise, incident response experience, automation capabilities, and relevant programming languages. Mention specific tools you've used: Prometheus, Kubernetes, Python, Go, Terraform. Be prepared to discuss why you chose certain tools or approaches.
Practice Interview
Study Questions
Technical Phone Screen 1: Linux Systems & Troubleshooting
What to Expect
This round tests systematic debugging methodology and deep Linux systems knowledge. The interviewer presents a complex system problem (such as SSH not working with console access or services failing to start) and asks you to diagnose the root cause. You'll navigate the /proc filesystem, interpret system state, use diagnostic tools, and explain your reasoning at each step. The focus is on methodology and logical progression rather than immediately knowing the answer. Expect questions about process management, memory behavior, system calls, and performance analysis. Interviewers assess both technical depth and your approach to problem-solving under uncertainty.
Tips & Advice
Before the interview, ensure comfortable proficiency navigating a Linux system via SSH. Practice real troubleshooting on your own systems—set up problems deliberately and solve them. During the interview, ask clarifying questions about symptoms before diving into diagnosis. Walk through your systematic process: gather information, form hypotheses, test them iteratively, verify the fix. Use tools confidently: strace (system calls), lsof (open files/sockets), tcpdump (network packets), netstat/ss (connection state), vmstat (memory/CPU), iostat (disk I/O). Know /proc filesystem structure and what information each file contains. Think out loud so the interviewer understands your reasoning. If stuck, pivot and try a different angle—demonstrate flexibility. For mid-level candidates, interviewers expect methodical narrowing of problem space, not random command trials.
Focus Topics
/proc Filesystem Navigation & System State Inspection
Master the /proc filesystem: /proc/[pid]/ for process details (maps, fd, status), /proc/net/ for networking state, /proc/meminfo for memory status, /proc/stat for CPU metrics, /proc/loadavg for system load, /proc/interrupts for interrupt activity. Know what each file contains and how to interpret data for diagnosis.
Practice Interview
Study Questions
System Performance Analysis & Bottleneck Identification
Analyze system performance using tools: top/htop (real-time resource usage), vmstat (memory/CPU context switches), iostat (disk I/O patterns), load average interpretation, perf (performance profiling). Identify bottlenecks: CPU-bound vs I/O-bound, memory pressure, disk saturation. Understand implications for reliability.
Practice Interview
Study Questions
Process Management & Process Lifecycle
Understand process creation (fork, exec), process states (running, sleeping, zombie), process hierarchy, and signals. Know how to inspect process state via /proc/[pid]/, interpret ps output, understand memory and CPU usage per process, diagnose zombie processes. Know PID 1 (init/systemd) role and process supervision.
Practice Interview
Study Questions
Memory Management & Virtual Memory
Understand virtual address space, physical memory allocation, page tables, virtual-to-physical address translation, memory protection. Know Linux memory zones (DMA, Normal, High), memory caching, swapping/paging mechanics. Interpret /proc/meminfo, understand memory pressure and OOM (Out of Memory) killer behavior. Know memory-related issues: memory leaks, excessive swapping, OOM scenarios.
Practice Interview
Study Questions
Systematic Linux Troubleshooting Methodology
Master a structured approach to diagnosing system issues: (1) clearly define what's wrong, (2) gather system state (logs, processes, network, disk, memory), (3) form hypotheses about root cause, (4) test hypotheses iteratively, (5) validate the fix doesn't break anything else. Know key diagnostic tools: strace (trace system calls), lsof (open files/network), tcpdump/Wireshark (packet inspection), ss/netstat (connections), vmstat/iostat (performance), top/htop (resource usage).
Practice Interview
Study Questions
Technical Phone Screen 2: Networking & Protocols
What to Expect
This round evaluates networking knowledge essential for distributed systems reliability. The interviewer conducts a deep dive into TCP/IP, DNS, HTTP/HTTPS, TLS, and load balancing. Expect questions like 'walk me through what happens when you access icloud.com' or 'explain TLS handshake and failure points.' You'll discuss protocol layers, network failure scenarios, debugging network issues, and how networking choices affect reliability. Unlike network engineers, SREs focus on reliability implications: how do network problems manifest in applications, how to detect them, how to mitigate them.
Tips & Advice
Review networking fundamentals with emphasis on practical implications for reliability. Understand the complete request path from client to server: DNS resolution, TCP connection establishment, TLS handshake, HTTP request/response. Know common networking failure modes and how they manifest: connection timeouts, DNS failures, packet loss, port exhaustion. Be comfortable with diagnostic tools: tcpdump/Wireshark (packet inspection), dig/nslookup (DNS), curl with verbose output, netstat/ss (connection state), mtr (route tracing). Understand load balancing strategies and their reliability tradeoffs. Discuss connection pooling, keep-alives, and retry strategies. For mid-level SREs, be able to think about how networking affects system reliability and give examples of network issues you've debugged. Practice explaining protocol behavior clearly.
Focus Topics
Load Balancing Strategies & Traffic Distribution
Understand load balancing algorithms: round-robin (fair distribution but ignores load), least connections (considers current connections), hash-based (consistent hashing for state affinity). Know Layer 4 (TCP) vs Layer 7 (application) load balancing tradeoffs. Understand health checking, failover mechanisms, sticky sessions. Know how load balancing choices affect reliability and performance.
Practice Interview
Study Questions
Network Troubleshooting & Diagnostic Tools
Master networking diagnostic tools: tcpdump/Wireshark for packet capture and analysis, dig/nslookup for DNS queries, curl with verbose output for HTTP debugging, netstat/ss for connection state inspection, traceroute/mtr for routing analysis, iperf for throughput testing. Know how to capture and interpret network traces.
Practice Interview
Study Questions
DNS Resolution & Service Discovery Reliability
Understand DNS protocol (recursive vs authoritative queries), query types (A, AAAA, CNAME, MX, SRV), caching and TTL implications, DNS propagation timing. Know how DNS failures impact service availability and how they cascade. Understand common DNS issues: resolution timeouts, NXDOMAIN responses, cache inconsistencies, split-brain scenarios.
Practice Interview
Study Questions
TCP/IP Fundamentals & Connection Reliability
Understand TCP three-way handshake, connection establishment, connection states (SYN-SENT, ESTABLISHED, TIME-WAIT), sequence numbers and acknowledgments, retransmission logic, congestion control (window sizing), and timeouts. Know UDP characteristics and when each is appropriate. Understand connection failure modes and diagnosis. Know about socket backlog and listen queue effects on reliability.
Practice Interview
Study Questions
HTTPS/TLS Security & Connection Handling
Understand TLS handshake (ClientHello, ServerHello, key exchange, finished), certificate validation, mutual TLS (mTLS). Know cipher suites and their selection. Understand common TLS issues: certificate expiration, hostname mismatch, weak ciphers, TLS version incompatibility. Know how TLS impacts latency and performance. Understand TLS session resumption.
Practice Interview
Study Questions
Onsite Round 1: Systems Internals Deep Dive
What to Expect
This first onsite round (typically virtual for mid-level candidates) dives deep into Linux kernel concepts and complex system behavior. The interviewer presents multi-layered system problems requiring understanding of kernel internals, advanced memory management, process scheduling, and I/O subsystems. You may diagnose complex system hangs, optimize performance under resource constraints, or explain unusual system behavior. Interviewers repeatedly ask 'why' to test understanding of underlying mechanisms, not surface-level knowledge. Expect discussions of kernel tuning, performance implications of different configurations, and tradeoffs in system design.
Tips & Advice
This round goes significantly deeper than phone screens. Review Linux kernel architecture and internals thoroughly. Understand process scheduling algorithms, memory management mechanisms (paging, segmentation, virtual memory), and I/O subsystems in detail. Be prepared for 'why' questions: Why does the kernel make certain design decisions? What are the tradeoffs? Prepare to explain complex scenarios: what happens when system memory is exhausted, how the kernel handles I/O under extreme load, how process scheduling ensures fairness. Practice explaining technical concepts clearly with analogies or diagrams when helpful. For mid-level, interviewers expect understanding of tradeoffs and design principles, not just facts. Bring specific examples: kernel tuning you've performed, performance issues you've diagnosed and solved, reliability improvements from system configuration changes. Be ready to discuss how kernel behavior affects application reliability.
Focus Topics
System Performance Tuning & Kernel Parameters
Know kernel tuning parameters (sysctl): network buffers, TCP timeouts, memory swappiness, process scheduling. Understand performance profiling tools: perf for CPU profiling, flame graphs for visualization, kernel tracing (tracepoints, kprobes). Know when and how to apply tuning for specific workloads. Understand tradeoffs: latency vs throughput, memory usage vs performance.
Practice Interview
Study Questions
I/O Subsystem & Storage Reliability
Understand I/O scheduler algorithms (CFQ—Completely Fair Queueing, deadline, noop), disk buffering and writeback caches, fsync and O_DIRECT semantics, RAID reliability, filesystem journaling. Know how I/O errors are handled and reported. Understand implications for data reliability. Know performance characteristics of different I/O patterns.
Practice Interview
Study Questions
Advanced Memory Management & Kernel Memory Subsystem
Understand page tables and virtual address translation, memory protection through page table entries, copy-on-write (CoW) optimization, memory reclamation and page eviction, swap mechanics and its performance implications. Know Linux memory pressure handling including kswapd (kernel swapper daemon) and OOM killer. Understand memory fragmentation and its effects. Know kernel memory accounting and cgroup memory limits.
Practice Interview
Study Questions
Process Scheduling & CPU Management
Understand Linux process scheduler: run queues per CPU, scheduling algorithms (CFS—Completely Fair Scheduler—for normal processes, real-time scheduling classes), context switching overhead, CPU affinity and NUMA considerations. Know how to interpret scheduler metrics (load average, context switches, runnable queue length). Understand scheduling classes and priority levels. Know how to diagnose CPU-bound system issues.
Practice Interview
Study Questions
Linux Kernel Architecture & Core Subsystems
Understand kernel organization: process management subsystem, memory management (virtual memory, paging, segmentation), interrupt handling and exceptions, device drivers interface, filesystem abstraction. Know kernel space vs user space, system call interface, and how applications interact with kernel. Understand kernel protection mechanisms preventing user applications from directly accessing hardware.
Practice Interview
Study Questions
Onsite Round 2: SRE Practices & Observability
What to Expect
This round evaluates your understanding of core SRE principles, operational practices, and observability architecture. The interviewer discusses monitoring strategy, defining and managing SLOs/SLIs/error budgets, incident response processes, automation priorities, and toil reduction. You'll answer questions like 'How do you measure if a system is reliable?', 'What would you monitor for a new service?', or 'Walk me through your incident response process.' This round includes significant behavioral assessment: collaboration during incidents, communication style, how you approach operational excellence, and your philosophy on reliability. For mid-level, emphasis is on end-to-end ownership: designing observable systems, establishing appropriate SLOs, and leading incident response.
Tips & Advice
Prepare concrete examples: monitoring you've designed and why you chose those metrics, SLOs you've established and how you justified them, incidents you've handled and lessons learned. Be ready to discuss tradeoffs: monitoring overhead vs observability value, alert sensitivity vs alert fatigue, SLO strictness vs development velocity. Understand SRE philosophy: reliability with velocity, using error budgets intelligently to make tradeoff decisions, automating toil. Know the four golden signals (latency, traffic, errors, saturation) and how to apply them. Be prepared to discuss specific observability tools (Prometheus, DataDog, Splunk, ELK) but focus on concepts over implementation details. Discuss automation examples: deployments you've automated, operational tasks you've eliminated, processes you've streamlined. For mid-level, interviewers want strategic thinking about operations: how to scale systems, systematically improve reliability, empower team members. Share examples of mentoring junior team members on SRE practices.
Focus Topics
Toil Identification & Automation Prioritization
Understand toil: repetitive, manual, unrewarding tasks that don't add long-term value. Know how to identify toil in your operations, quantify its impact (hours/week), and prioritize automation efforts. Understand common automation targets: deployments, autoscaling, backup/recovery, health checks. Know infrastructure-as-code and configuration management approaches. Understand ROI of automation: development cost vs time saved.
Practice Interview
Study Questions
Observability Tools & Metrics Collection Strategies
Understand industry-standard tools: Prometheus (time-series metrics), ELK/Splunk (logging and analysis), Jaeger/Zipkin (distributed tracing). Know push vs pull metrics collection models, time-series database concepts, query languages (PromQL). Understand performance implications of different observability approaches: collection overhead, storage requirements, query latency. Know cost-benefit tradeoffs of different observability solutions.
Practice Interview
Study Questions
Incident Response & Postmortem Culture
Understand incident classification (severity levels), escalation procedures, incident communication, incident command structure. Know effective postmortem processes: document what happened, root cause analysis (not blame), identify systemic improvements, track action items. Understand blameless culture principles and psychological safety in incident reviews. Know how to prevent similar incidents through systemic fixes, not individual blame.
Practice Interview
Study Questions
Monitoring, Alerting & Observability Architecture Design
Design comprehensive monitoring: identify key metrics (four golden signals: latency, traffic, errors, saturation), instrument systems appropriately, define meaningful alerts, establish alert routing and escalation. Understand tracing for distributed request paths. Understand logging for detailed investigation. Design for observability: avoid blind spots in monitoring, ensure metrics are actionable, prevent alert fatigue through intelligent alerting.
Practice Interview
Study Questions
Service Level Objectives (SLOs), SLIs & Error Budgets
Understand SLO definition: specific, measurable objectives tied to business requirements (e.g., '99.9% availability monthly'). Distinguish between SLOs and SLIs (Service Level Indicators—actual measurements). Know error budget concept: if SLO is 99.9%, you have 0.1% error budget (failures allowed). Use error budgets for tradeoff decisions between reliability investment and feature development. Understand SLO implications on engineering priorities and resource allocation.
Practice Interview
Study Questions
Onsite Round 3: Coding & Automation
What to Expect
This round combines algorithm problem-solving with SRE-relevant practical scenarios. Expect one standard coding problem (LeetCode Easy to Medium difficulty, often involving data structures like trees or graphs) and/or SRE-specific challenges like log parsing/aggregation, implementing a monitoring system, or automating operational tasks. The focus is on coding proficiency, debugging ability, and ability to write clean, maintainable code. Unlike software engineer interviews, emphasis is less on optimal algorithmic complexity and more on correctness, clarity, practical applicability, and production-readiness.
Tips & Advice
Review LeetCode focusing on tree and graph problems (BFS/DFS). Practice in Python or Go (common SRE languages at Apple). During the interview, clarify requirements before coding, talk through your approach, and write clean, readable code. Test your solution with examples including edge cases. For SRE-specific problems, think about real operational scenarios: handling incomplete data, network timeouts, rate limiting. Discuss tradeoffs: performance vs readability, quick-and-dirty vs production-ready code. For mid-level, write production-quality code and discuss testing, error handling, and monitoring of your own code. Know basic debugging: print statements, logging, understanding error messages. Be comfortable with standard library functions in your chosen language.
Focus Topics
Python/Go & SRE-Relevant Language Proficiency
Strong proficiency in primary SRE language (likely Python or Go at Apple). Know standard library functions for common tasks: requests for HTTP, json for data handling, subprocess for system interaction, file I/O. Understand language-specific idioms and best practices. Know performance characteristics and limitations of the language.
Practice Interview
Study Questions
Debugging & Systematic Problem-Solving
Demonstrate systematic debugging: identify the problem clearly, isolate the cause, form hypotheses, test them iteratively, validate the fix. Be comfortable with print debugging, understanding error messages and stack traces. Know when to use debuggers vs other approaches. Understand common bugs: off-by-one errors, null pointer dereferences, resource leaks.
Practice Interview
Study Questions
Algorithm Implementation & Data Structures Proficiency
Master common data structures (arrays, linked lists, binary trees, graphs, hash tables) and their operations. Implement basic algorithms (sorting, searching, BFS/DFS, tree traversal). Understand time and space complexity implications. Write implementations that are correct, clear, and reasonably efficient. Know when to use different data structures based on use case.
Practice Interview
Study Questions
Practical SRE Scenarios & Operational Scripting
Ability to solve real SRE problems: parsing and aggregating logs to extract metrics, implementing health checks, writing deployment scripts, automating data processing, rate limiting implementations. Know how to handle common issues: file handling errors, network timeouts, retries with backoff. Write scripts that handle partial failures gracefully.
Practice Interview
Study Questions
Production Code Quality & Maintainability
Write code that is correct, readable, and maintainable: meaningful variable and function names, appropriate comments, error handling for failure cases, edge case consideration, input validation. Write code that others can understand and modify. Think about testing: how would this code be tested? Write code defensively against invalid inputs or unexpected conditions.
Practice Interview
Study Questions
Onsite Round 4: System Design
What to Expect
This final onsite round evaluates your ability to design scalable, reliable distributed systems. You'll receive an open-ended design problem (e.g., 'Design a system like GitHub handling repositories, pull requests, and merging for scale' or 'Design a reliable task queue') and discuss the entire architecture. Cover system components, data flow, consistency models, failure handling, monitoring, deployment strategy, and tradeoffs. For mid-level SREs, the unique focus is operational and reliability aspects alongside scalability: How is this system deployed? How is it monitored? How does it recover from failures? What's the disaster recovery strategy? Unlike software engineers who focus on correctness and scalability, mid-level SREs emphasize operability.
Tips & Advice
Prepare by reviewing system design principles: scalability (horizontal vs vertical scaling tradeoffs), consistency models (strong vs eventual consistency), availability and partition tolerance (CAP theorem). Know common architectural patterns: microservices, database replication strategies, load balancing, caching layers, queue-based architectures. Practice structured approach: clarify requirements and constraints, sketch high-level architecture, discuss key components, address failure modes, consider operational aspects. For mid-level SREs, emphasize operational considerations: deployment strategy and rollback procedures, comprehensive monitoring and alerting, incident response and recovery procedures, graceful degradation under failures, limiting blast radius of failures. Discuss how the system would be deployed, monitored, recovered from disaster scenarios. Draw diagrams clearly and explain tradeoffs thoughtfully. Think about end-to-end ownership: a system you'd be responsible for supporting in production.
Focus Topics
Operational Complexity & Deployment Strategy
Think critically about operational burden: how many moving parts, complexity of running and updating the system, dependency management, configuration complexity. Design for operational simplicity where possible: fewer components, clearer dependencies, simpler deployment. Discuss deployment strategy: blue-green deployments, canary releases, rollback procedures, infrastructure-as-code. Discuss how you'd monitor deployments and quickly detect issues.
Practice Interview
Study Questions
Data Storage, Consistency & Persistence
Choose appropriate database types (relational, NoSQL, time-series) for different data patterns. Understand consistency models (strong/immediate vs eventual consistency) and their tradeoffs. Discuss replication strategies (master-slave, multi-master), backup and recovery, disaster recovery procedures. Know transaction semantics and their reliability implications. Discuss data durability guarantees.
Practice Interview
Study Questions
Scalable System Architecture & Core Components
Design principles for scalability: load balancing strategies, horizontal scaling of stateless services, database scaling (replication, sharding), caching layers (reducing load on databases), asynchronous processing via queues, CDN for static content. Know component interactions, data flow patterns, consistency tradeoffs (immediate vs eventual consistency). Discuss why you chose specific architectural patterns for your use case.
Practice Interview
Study Questions
Reliability Through Redundancy & Failure Handling
Design for failures: redundancy (multiple instances, geographic distribution), circuit breakers (preventing cascading failures), retries with exponential backoff, bulkheads (isolating failure blast radius), graceful degradation (reduced functionality under partial failures). Identify critical paths and single points of failure. Discuss failure recovery strategies and system behavior under partial degradation. Know timeout and retry semantics.
Practice Interview
Study Questions
Observability & Monitoring Architecture in System Design
Design systems with observability built in: identify instrumentation points, define key metrics (four golden signals: latency, traffic, errors, saturation), design health checks, plan for alert generation. Discuss distributed tracing across components for request path visibility. Design for operational visibility: structured logging, metrics aggregation, alerting and escalation. Discuss how you'd diagnose common failure modes in this system. Design runbooks for common operational tasks.
Practice Interview
Study Questions
Frequently Asked Site Reliability Engineer (SRE) Interview Questions
Sample Answer
Sample Answer
Sample Answer
import numpy as np
def canary_significance(control, canary, method='permutation', n_iter=10000, alpha=0.05, seed=None):
"""
control, canary: 1D numeric arrays of equal length (e.g., 30)
method: 'permutation' or 'bootstrap'
n_iter: number of resamples
Returns: dict with p_value, reject (bool), observed_diff, ci (95%), effect_size (Cohen's d)
"""
rng = np.random.default_rng(seed)
control = np.asarray(control)
canary = np.asarray(canary)
assert control.shape == canary.shape, "Arrays must be same shape"
obs_diff = canary.mean() - control.mean() # one-sided: is canary worse (higher)?
pooled = np.concatenate([control, canary])
null_diffs = np.empty(n_iter)
n = len(control)
if method == 'permutation':
for i in range(n_iter):
rng.shuffle(pooled)
null_diffs[i] = pooled[n:].mean() - pooled[:n].mean()
elif method == 'bootstrap':
for i in range(n_iter):
res_c = rng.choice(control, size=n, replace=True)
res_k = rng.choice(canary, size=n, replace=True)
null_diffs[i] = res_k.mean() - res_c.mean()
# For bootstrap, to test null of no diff, center distribution:
null_diffs -= null_diffs.mean()
else:
raise ValueError("method must be 'permutation' or 'bootstrap'")
# one-sided p-value: fraction of null diffs >= observed diff
p_value = np.mean(null_diffs >= obs_diff)
reject = p_value < alpha
# 95% bootstrap CI for the difference in means (using bootstrap resampling)
# reuse bootstrap idea regardless of method for CI
boot_diffs = np.empty(n_iter)
for i in range(n_iter):
res_c = rng.choice(control, size=n, replace=True)
res_k = rng.choice(canary, size=n, replace=True)
boot_diffs[i] = res_k.mean() - res_c.mean()
ci_lower, ci_upper = np.percentile(boot_diffs, [100*alpha/2, 100*(1-alpha/2)])
# Cohen's d for effect size (pooled SD)
pooled_sd = np.sqrt(((control.var(ddof=1) + canary.var(ddof=1)) / 2))
effect_size = (canary.mean() - control.mean()) / (pooled_sd + 1e-12)
return {
'p_value': float(p_value),
'reject': bool(reject),
'observed_diff': float(obs_diff),
'ci_95': (float(ci_lower), float(ci_upper)),
'effect_size_cohens_d': float(effect_size),
'method': method,
'n_iter': n_iter
}Sample Answer
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<EOF
Usage: $0 <existing_upstream_conf> <new_upstream_conf>
Example: $0 /etc/nginx/conf.d/upstream.conf ./green_upstream.conf
EOF
exit 2
}
# Validate args
if [ "${#}" -ne 2 ]; then usage; fi
EXISTING="${1}"
NEW="${2}"
if [ ! -f "$EXISTING" ]; then echo "ERROR: existing conf not found: $EXISTING" >&2; exit 1; fi
if [ ! -f "$NEW" ]; then echo "ERROR: new conf not found: $NEW" >&2; exit 1; fi
if ! command -v nginx >/dev/null 2>&1; then echo "ERROR: nginx not installed" >&2; exit 1; fi
TS=$(date +"%Y%m%dT%H%M%S")
BACKUP="${EXISTING}.bak.${TS}"
TMPDIR=$(mktemp -d)
TMP_NEW="${TMPDIR}/new.conf"
cleanup() {
rm -rf "$TMPDIR" || true
}
trap cleanup EXIT
rollback() {
echo "Rolling back to backup ${BACKUP}"
if [ -f "$BACKUP" ]; then
cp -- "$BACKUP" "$EXISTING"
nginx -t && nginx -s reload
else
echo "No backup to rollback to" >&2
fi
}
trap 'echo "Error encountered"; rollback; exit 1' ERR
# Backup current config
cp -- "$EXISTING" "$BACKUP"
echo "Backed up $EXISTING -> $BACKUP"
# Stage new config into temp and validate syntax by testing nginx with -c
cp -- "$NEW" "$TMP_NEW"
# Option A: Test global nginx with the new upstream included by copying into place in a test directory.
# Simpler: replace existing file atomically in a temp and test nginx (requires same include paths)
cp -- "$TMP_NEW" "${EXISTING}.tmp"
mv -T -- "${EXISTING}.tmp" "$EXISTING"
# Test nginx config
if nginx -t; then
# Reload nginx gracefully
nginx -s reload
echo "Switched upstream and reloaded nginx successfully."
else
echo "nginx -t failed after applying new config" >&2
false # trigger ERR trap -> rollback
fi
# Success: remove backup older than 7 days (optional)
find "$(dirname "$BACKUP")" -name "$(basename "$EXISTING").bak.*" -mtime +7 -delete || trueSample Answer
Sample Answer
Sample Answer
Sample Answer
#!/usr/bin/env python3
import sys
import json
from collections import defaultdict
from datetime import datetime, timedelta, timezone
WINDOW_MINUTES = 60
def parse_iso8601(s):
# Accept e.g. "2025-03-12T14:05:23Z" or with offset
if s.endswith("Z"):
s = s[:-1] + "+00:00"
return datetime.fromisoformat(s)
def minute_bucket(dt):
# Ensure timezone-aware (convert naive to UTC)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
dt = dt.astimezone(timezone.utc)
return dt.replace(second=0, microsecond=0)
def purge_old(counts, newest_minute):
cutoff = newest_minute - timedelta(minutes=WINDOW_MINUTES - 1)
# remove keys older than cutoff
to_delete = [k for k in counts if k < cutoff]
for k in to_delete:
del counts[k]
def main(log_path, target_service):
counts = defaultdict(int)
newest_minute = None
with open(log_path, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
if not line:
continue
try:
obj = json.loads(line)
except json.JSONDecodeError:
continue # skip malformed lines
if obj.get("service") != target_service:
continue
if obj.get("level") != "ERROR":
continue
ts = obj.get("timestamp")
if not ts:
continue
try:
dt = parse_iso8601(ts)
except Exception:
continue
mb = minute_bucket(dt)
counts[mb] += 1
if (newest_minute is None) or (mb > newest_minute):
newest_minute = mb
purge_old(counts, newest_minute)
if newest_minute is None:
# No matching entries; print last 60 minutes relative to now
newest_minute = minute_bucket(datetime.now(timezone.utc))
start = newest_minute - timedelta(minutes=WINDOW_MINUTES - 1)
cur = start
while cur <= newest_minute:
print(cur.strftime("%Y-%m-%d %H:%M"), counts.get(cur, 0))
cur += timedelta(minutes=1)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: script.py /path/to/logfile service_name", file=sys.stderr)
sys.exit(2)
main(sys.argv[1], sys.argv[2])Sample Answer
Sample Answer
Recommended Additional Resources
- Designing Data-Intensive Applications by Martin Kleppmann - comprehensive guide to distributed systems design, consistency models, and reliability
- The Site Reliability Workbook by Google (Google SRE team) - practical SRE principles, techniques, and case studies for building reliable systems
- Linux Performance by Brendan Gregg - essential reference for performance analysis, troubleshooting tools, and system optimization
- TCP/IP Illustrated Volume 1 by W. Richard Stevens - deep technical dive into network protocols and their behavior
- LeetCode - practice coding problems, focus on medium difficulty tree/graph problems and BFS/DFS algorithms
- GitHub SRE Interview Prep Guide (mxssl/sre-interview-prep-guide) - curated collection of SRE interview topics including Linux, networking, system design, and monitoring
- Prometheus documentation and PromQL query language reference - essential for understanding metrics collection and querying
- Linux kernel source code and documentation - deep understanding of kernel internals through primary sources
- Your own production incidents - review past incidents you've handled, post-mortems, and think about observability improvements that could have reduced response time
- Apple corporate website and product ecosystem - understand company's products, reliability requirements, and commitment to quality
- Glassdoor and Blind reviews of Apple SRE interviews - learn from recent candidate experiences and common interview patterns
Search Results
Top 15 Apple Reliability Engineer Job Interview Questions & Answers
Question #1. Can you describe your experience with reliability engineering, particularly in the context of hardware systems? · Question #2.
Apple SRE Interview Experience (Offer) - Software Engineering - Blind
Total process took 6 months, 3 months to reply to initial application (with referral), 1 month after completing interviews to get offer, 7 rounds total.
2025 Apple Site Reliability Engineer interview question bank
A complete set of Apple Site Reliability Engineer interview questions. Contributed by recent candidates and vetted by current Apple Site ...
Apple Reliability Engineer Interview Questions - NodeFlair
Apple Reliability Engineer interview questions and answers. Free interview details posted anonymously by Apple interview candidates.
Site Reliability Engineer (SRE) Interview Preparation Guide - GitHub
A collection of questions to practice with for SRE interviews · SRE Interview Questions · Sysadmin Test Questions · Kubernetes job interview questions · DevOps ...
Apple Site Reliability Engineer Interview: Process + Questions
Prepare thoughtful questions: “What is the biggest reliability challenge your team faces right now?” “How do you measure success for an SRE here ...
This interview preparation guide was generated using AI-powered research from the sources listed above. While we strive for accuracy, we recommend verifying critical information from official company sources.
Want to create your own tailored preparation guide using our deep research?
Get Started for FreeInterview-Ready Courses
Visual-first, interactive, structured learning paths
Browse Site Reliability Engineer (SRE) jobs
AI-enriched listings across hundreds of company career pages
Explore Jobs