InterviewStack.io LogoInterviewStack.io

Performance Engineering and Cost Optimization Questions

Engineering practices and trade offs for meeting performance objectives while controlling operational cost. Topics include setting latency and throughput targets and latency budgets; benchmarking profiling and tuning across application database and infrastructure layers; memory compute serialization and batching optimizations; asynchronous processing and workload shaping; capacity estimation and right sizing for compute and storage to reduce cost; understanding cost drivers in cloud environments including network egress and storage tiering; trade offs between real time and batch processing; and monitoring to detect and prevent performance regressions. Candidates should describe measurement driven approaches to optimization and be able to justify trade offs between cost complexity and user experience.

MediumTechnical
44 practiced
Implement a token-bucket rate limiter in Python using Redis as the central store. The limiter should support checking and consuming tokens atomically for N requests per second per tenant. Provide an API sketch and ensure the implementation handles concurrency across multiple workers and minimizes Redis operations.
HardTechnical
49 practiced
You are asked to reduce monthly cloud costs by 35% without materially degrading user experience. Describe a measurement-driven approach: how you'd identify largest cost contributors, design experiments (A/B, canaries) to trade latency for cost, define success metrics and rollback criteria, and communicate the plan to stakeholders.
HardSystem Design
42 practiced
Your analytics cluster is slowed down by a few heavy dashboard queries that starve other workloads. Design an isolation and resource management strategy: workload queues, concurrency limits, materialized views, result-caching, and query prioritization. Explain how each affects cost and latency for both heavy users and ad-hoc analysts.
EasyTechnical
50 practiced
In Python, implement an iterator-based function that reads a large newline-delimited JSON events file (1B lines) and produces daily counts per user without loading the whole file into memory. Specify assumptions about sort order or state size, and ensure memory usage is bounded (assume few million active users per day).
EasyTechnical
52 practiced
Explain what a latency budget is and how a data engineer uses latency budgets across stages in a data pipeline. Given an end-to-end SLA of 2 seconds for an API that depends on (1) event ingestion, (2) transformation pipeline, and (3) query/serve layer, propose per-stage latency budgets, describe enforcement mechanisms (timeouts, retries, SLIs/SLOs), and discuss trade-offs between strict budgets and fault tolerance.

Unlock Full Question Bank

Get access to hundreds of Performance Engineering and Cost Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.