InterviewStack.io LogoInterviewStack.io

Data Pipeline Scalability and Performance Questions

Design data pipelines that meet throughput and latency targets at large scale. Topics include capacity planning, partitioning and sharding strategies, parallelism and concurrency, batching and windowing trade offs, network and I O bottlenecks, replication and load balancing, resource isolation, autoscaling patterns, and techniques for maintaining performance as data volume grows by orders of magnitude. Include approaches for benchmarking, backpressure management, cost versus performance trade offs, and strategies to avoid hot spots.

MediumTechnical
32 practiced
Implement an asyncio-based batching producer in Python. The API should provide 'async send(msg)' which enqueues messages and flushes them when batch size reaches N or when a timer T elapses. Provide 'await close()' which flushes remaining messages and stops background tasks. Show thread-safety considerations and how to apply backpressure to callers.
MediumTechnical
31 practiced
Create a migration plan to move an on-prem Kafka cluster to a managed cloud Kafka (e.g., MSK or Confluent Cloud). The plan should preserve ordering, minimize downtime, migrate schemas and ACLs, validate performance, and include rollback paths. Describe tools, steps (replication, dual-write, cutover), and validation checks.
HardTechnical
32 practiced
Design a benchmark to compare ingesting Parquet (columnar) versus Avro (row) into an analytical data warehouse. Specify dataset characteristics (schema, cardinality, nullability), ingestion metrics (MB/s, CPU), query metrics (p50/p95 query latency for typical analytic queries), compression codecs, schema-evolution scenarios, and overall cost measurement strategy.
MediumSystem Design
38 practiced
Design a Kafka partitioning strategy for a high-cardinality user-event topic with the following constraints: 1B messages/day, ~100 partitions in a cluster, and skew where 1% of users produce 50% of traffic. You must preserve per-user ordering while avoiding hot partitions. Describe your partitioning approach, operational steps to mitigate skew, and how to maintain ordering for heavy users.
HardSystem Design
37 practiced
Design a cross-region streaming replication solution that enables low-latency regional reads and supports global fault tolerance. Requirements: replication lag typically <5s, support regional read locality, and tolerate regional failures. Discuss active-passive vs active-active topologies, conflict resolution, metadata propagation, and tools such as MirrorMaker, Confluent Replicator, or custom replication.

Unlock Full Question Bank

Get access to hundreds of Data Pipeline Scalability and Performance interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.