InterviewStack.io LogoInterviewStack.io

Performance Optimization and Latency Engineering Questions

Covers systematic approaches to measuring and improving system performance and latency at architecture and code levels. Topics include profiling and tracing to find where time is actually spent, forming and testing hypotheses, optimizing critical paths, and validating improvements with measurable metrics. Candidates should be able to distinguish central processing unit bound work from input output bound work, analyze latency versus throughput trade offs, evaluate where caching and content delivery networks help or hurt, recognize database and network constraints, and propose strategies such as query optimization, asynchronous processing patterns, resource pooling, and load balancing. Also includes performance testing methodologies, reasoning about trade offs and risks, and describing end to end optimisation projects and their business impact.

MediumTechnical
0 practiced
Design a load-testing plan to measure end-to-end latency from ingestion to data availability in a data warehouse for 1M messages/day with bursts to 10k/min. Include test data generation, ramp patterns, metrics to collect (ingest latency, processing latency, queue lag), validation criteria, and how to detect regressions over time.
HardTechnical
0 practiced
Implement a streaming approximate quantile sketch in Python that supports update(value) and query(q) returning an approximate q-th quantile with error <= epsilon. You may implement a simplified Greenwald-Khanna or t-digest variant; focus on correctness, mergeability across nodes, and space complexity O(1/epsilon). Explain how to tune epsilon for accurate p95 estimation at high volume.
HardTechnical
0 practiced
Analyze the simplified Python Kafka consumer loop and identify performance/GC issues, then propose a rewritten approach. Code: `def process_messages(consumer): records = []; for msg in consumer: data = json.loads(msg.value); result = transform(data); records.append(result); if len(records) > 1000: store_bulk(records); records = []`. Explain GC hotspots and rewrite strategy to reduce CPU and GC pauses (e.g., streaming processing, reuse buffers, ujson, memoryview, batch sizes, async IO).
EasyTechnical
0 practiced
Implement a Python decorator named timeit that measures wall-clock time for a function call and records nested call timing correctly so totals for parent and child can be separated. Requirements: thread-safe, store results in a global registry keyed by function name, preserve original function return value, and keep overhead minimal.
EasyTechnical
0 practiced
Describe what resource pooling is and why connection pooling reduces latency for databases. List core pool parameters to tune (max_connections, min_idle, idle_timeout, connection-ttl), how to detect pool exhaustion, and practical mitigations if pools are exhausted under load.

Unlock Full Question Bank

Get access to hundreds of Performance Optimization and Latency Engineering interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.