InterviewStack.io LogoInterviewStack.io

Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

EasyTechnical
0 practiced
Write a ksqlDB or Flink SQL statement to compute a 5-minute tumbling window count of events per user_id from Kafka topic 'clicks' (value is JSON with fields user_id and event_time). Include how you would set the event-time column and handle late arrivals up to 2 minutes.
MediumTechnical
0 practiced
Describe mechanisms to achieve exactly-once processing semantics end-to-end in a pipeline with Kafka producers, a Flink stream job, and an external sink (e.g., a transactional DB). Explain Kafka transactions, idempotent producers, Flink two-phase commit sink support, and when true end-to-end exactly-once is practical versus when idempotent or at-least-once with compensating logic is preferable.
MediumTechnical
0 practiced
You're given current throughput goals and hardware constraints. Recommend a partitioning and scaling plan for a Kafka topic expected to serve 100k messages/sec with a median message size of 500 bytes. Explain how to estimate the number of partitions, broker resources, and consumer parallelism, and propose a mitigation for potential hot partitions.
MediumTechnical
0 practiced
Explain the impact of consumer rebalancing on stateful stream processors and describe strategies to minimize disruption (e.g., sticky assignment, cooperative rebalancing, avoiding unnecessary partition changes). How would you handle state migration to minimize downtime when scaling consumer count from 4 to 20?
MediumTechnical
0 practiced
List the key operational metrics and alerts you would set up to monitor a streaming ML pipeline (ingest -> processing -> serving). Include metrics for Kafka (lag, ISR), stream processors (checkpoint duration, state size), and model-serving endpoints (latency, errors). Propose SLOs and alert thresholds for a pipeline that must maintain 99.9% availability and process 95% of events within 200ms.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.