InterviewStack.io LogoInterviewStack.io

Performance Optimization and Latency Engineering Questions

Covers systematic approaches to measuring and improving system performance and latency at architecture and code levels. Topics include profiling and tracing to find where time is actually spent, forming and testing hypotheses, optimizing critical paths, and validating improvements with measurable metrics. Candidates should be able to distinguish central processing unit bound work from input output bound work, analyze latency versus throughput trade offs, evaluate where caching and content delivery networks help or hurt, recognize database and network constraints, and propose strategies such as query optimization, asynchronous processing patterns, resource pooling, and load balancing. Also includes performance testing methodologies, reasoning about trade offs and risks, and describing end to end optimisation projects and their business impact.

HardTechnical
56 practiced
A transactional database shows write latency spikes due to fsync on commit. Propose storage-level and DB-level mitigations: disk selection (HDD vs SSD vs NVMe), RAID vs single device, filesystem mount options, group commit, commit frequency, WAL tuning, and quantify expected latency improvements and durability trade-offs.
MediumTechnical
68 practiced
Compare batch and stream processing in terms of latency, throughput, consistency, operational complexity, and cost for typical data engineering workloads. Give a concrete recommendation for calculating near-real-time fraud signals with a 1-minute detection window.
MediumTechnical
52 practiced
In PySpark you have large_events (~1B rows) and small_user_lookup (~5k rows). Write PySpark code to join them efficiently using broadcast, explain when broadcasting is appropriate, memory considerations (executor heap), and risks such as OOM and skew handling. Provide assumptions about executor memory and shuffle behavior.
HardTechnical
58 practiced
Monitoring shows a 50ms increase in p99 for a daily ETL job compared to baseline. Describe a systematic triage approach to find root cause across infrastructure (CPU, disk, network), query plan changes, data distribution/skew, upstream schema or data format changes, dependency regressions, and how to validate your hypothesis with tests and rollback plans.
HardTechnical
55 practiced
Implement a streaming approximate quantile sketch in Python that supports update(value) and query(q) returning an approximate q-th quantile with error <= epsilon. You may implement a simplified Greenwald-Khanna or t-digest variant; focus on correctness, mergeability across nodes, and space complexity O(1/epsilon). Explain how to tune epsilon for accurate p95 estimation at high volume.

Unlock Full Question Bank

Get access to hundreds of Performance Optimization and Latency Engineering interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.