InterviewStack.io LogoInterviewStack.io

Performance Engineering and Cost Optimization Questions

Engineering practices and trade offs for meeting performance objectives while controlling operational cost. Topics include setting latency and throughput targets and latency budgets; benchmarking profiling and tuning across application database and infrastructure layers; memory compute serialization and batching optimizations; asynchronous processing and workload shaping; capacity estimation and right sizing for compute and storage to reduce cost; understanding cost drivers in cloud environments including network egress and storage tiering; trade offs between real time and batch processing; and monitoring to detect and prevent performance regressions. Candidates should describe measurement driven approaches to optimization and be able to justify trade offs between cost complexity and user experience.

HardSystem Design
0 practiced
Design a high-level architecture for a globally-distributed, user-facing API expected to handle 1M requests per second with a p99 latency target of 150ms and strong cost constraints. Discuss caching (edge vs regional), multi-region data placement, consistency choices, traffic routing, and a plan to minimize network egress costs.
MediumTechnical
0 practiced
You have a Python service that parses and transforms large JSON payloads (~100MB each). Propose and implement (or pseudocode) a streaming approach to process payloads with bounded memory (e.g., using ijson or iterparse). Explain how you'd measure improvements and trade-offs.
MediumTechnical
0 practiced
Given this SQL query that scans a large orders table, recommend concrete indexing, read-replica, or caching strategies to reduce p99 query latency. Include how you'd measure and validate the improvement.
Query:
SELECT user_id, SUM(amount)
FROM orders
WHERE created_at >= date_trunc('month', now())
GROUP BY user_id
ORDER BY SUM(amount) DESC
LIMIT 100
MediumTechnical
0 practiced
In Python, implement a streaming quantile estimator that supports add(value) and quantile(q) operations for latency values. Your implementation must use O(k) memory where k is configurable and much smaller than number of samples. Explain your algorithm choice (t-digest, hdrhistogram, reservoir sampling) and include a short implementation sketch.
HardTechnical
0 practiced
Tail latency is hurting 0.5% of users due to queuing and head-of-line blocking in a downstream service. Propose system-level changes to reduce p99: hedged requests, priority queues, admission control, timeouts, and client-side fallbacks. For each change, explain expected impact on latency, throughput, and cost.

Unlock Full Question Bank

Get access to hundreds of Performance Engineering and Cost Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.