InterviewStack.io LogoInterviewStack.io

Performance Engineering and Cost Optimization Questions

Engineering practices and trade offs for meeting performance objectives while controlling operational cost. Topics include setting latency and throughput targets and latency budgets; benchmarking profiling and tuning across application database and infrastructure layers; memory compute serialization and batching optimizations; asynchronous processing and workload shaping; capacity estimation and right sizing for compute and storage to reduce cost; understanding cost drivers in cloud environments including network egress and storage tiering; trade offs between real time and batch processing; and monitoring to detect and prevent performance regressions. Candidates should describe measurement driven approaches to optimization and be able to justify trade offs between cost complexity and user experience.

MediumTechnical
51 practiced
A shopping cart checkout currently processes payments synchronously and causes high latency for customers. Propose a migration plan to an asynchronous payment processing model that preserves trust (e.g., payment confirmations), minimizes data loss, and keeps chargeback/fraud risk acceptable. Discuss UX changes, compensating transactions, telemetry, and rollout plan with validation metrics.
EasyTechnical
53 practiced
Describe step-by-step how you would profile a CPU-bound vs an I/O-bound backend service written in Java or Python to find hotspots. Mention tools you would use (e.g., async-profiler, py-spy, perf, pprof, flamegraphs), what signals you expect from each tool, and how to attribute latency to code, blocking I/O, or external systems.
MediumTechnical
44 practiced
You manage a fleet of backend services on Kubernetes and need autoscaling rules that balance latency SLOs and cost. Propose autoscaling strategies using CPU, request rate, custom metrics (queue depth), and predictive scaling. Explain stabilization windows, cooldowns, how to avoid oscillation, and trade-offs when scaling on throughput vs latency percentiles.
EasyTechnical
43 practiced
Describe a simple method to right-size compute instances for a web service: given requests-per-second (RPS), average CPU and memory per request, desired SLO headroom, and instance types/price, estimate instance size and count. Explain trade-offs between fewer larger instances and many smaller instances with respect to cost, fault domain, and operational complexity.
EasyTechnical
53 practiced
Describe a measurement-driven method to set latency budgets and SLOs for a backend API used by a mobile app. Outline steps to collect baseline metrics, segment users by region/device, map SLOs to business metrics (e.g., conversion), choose percentile targets (p50/p95/p99), and set error budgets and escalation policies.

Unlock Full Question Bank

Get access to hundreds of Performance Engineering and Cost Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.