InterviewStack.io LogoInterviewStack.io

Performance Optimization and Latency Engineering Questions

Covers systematic approaches to measuring and improving system performance and latency at architecture and code levels. Topics include profiling and tracing to find where time is actually spent, forming and testing hypotheses, optimizing critical paths, and validating improvements with measurable metrics. Candidates should be able to distinguish central processing unit bound work from input output bound work, analyze latency versus throughput trade offs, evaluate where caching and content delivery networks help or hurt, recognize database and network constraints, and propose strategies such as query optimization, asynchronous processing patterns, resource pooling, and load balancing. Also includes performance testing methodologies, reasoning about trade offs and risks, and describing end to end optimisation projects and their business impact.

HardTechnical
59 practiced
Design an end-to-end observability stack to support latency engineering across data pipelines. Describe what you would instrument (producers, brokers, processors, storage, query layer), which metrics and traces to collect, trace sampling strategy and context propagation, log enrichment, dashboards/alerts tied to SLOs, and how to run before/after experiments to validate optimizations.
EasyTechnical
64 practiced
Explain why mean latency can be misleading for production systems. Define p50, p95, and p99, and describe how you would calculate and present them for an hourly ETL job that processes variable batch sizes. Address outliers and reporting window choices.
EasyTechnical
66 practiced
Explain the difference between latency and throughput in data systems. Give concrete examples (metric names and units), explain when each is the primary operational concern, and describe one situation in a data pipeline where optimizing throughput harms latency. Provide short examples from ingestion pipelines, batch jobs, and an online API.
MediumTechnical
52 practiced
In PySpark you have large_events (~1B rows) and small_user_lookup (~5k rows). Write PySpark code to join them efficiently using broadcast, explain when broadcasting is appropriate, memory considerations (executor heap), and risks such as OOM and skew handling. Provide assumptions about executor memory and shuffle behavior.
EasyTechnical
56 practiced
Describe what resource pooling is and why connection pooling reduces latency for databases. List core pool parameters to tune (max_connections, min_idle, idle_timeout, connection-ttl), how to detect pool exhaustion, and practical mitigations if pools are exhausted under load.

Unlock Full Question Bank

Get access to hundreds of Performance Optimization and Latency Engineering interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.