InterviewStack.io LogoInterviewStack.io

Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardTechnical
0 practiced
Walk through techniques to obtain end-to-end exactly-once semantics when consuming from Kafka, processing in Flink, and writing results to an external relational database that does not support distributed transactions. Discuss Flink's two-phase commit sink, idempotent upserts, deduplication patterns, and the performance trade-offs of each approach.
HardTechnical
0 practiced
You have to join two high-volume streams: user_actions (100k/sec) and product_updates (10k/sec), where product_updates can arrive late. Propose a join design that keeps state bounded and meets latency SLAs. Explain asymmetric windowing, retention policies, use of compacted lookup topics, bloom filters or approximate caches, and fallbacks for missing join data.
MediumTechnical
0 practiced
How would you detect concept drift in a streaming prediction pipeline? Propose concrete metrics to compute (for example prediction distribution shifts, population stability index, calibration error), aggregation windows and thresholds, and an alerting strategy. Also explain strategies to obtain labels when ground-truth is delayed.
HardTechnical
0 practiced
Explain how watermarks are implemented conceptually in Apache Flink and in Spark Structured Streaming. Discuss periodic watermark advancement, punctuated watermarks, and how watermark strategies and delay thresholds affect correctness and completeness in the presence of out-of-order events.
HardBehavioral
0 practiced
Tell me about a time you disagreed with engineers about moving a model into real-time streaming. Use the STAR framework (Situation, Task, Action, Result) to explain the scenario, how you influenced the decision, how you balanced product value vs operational risk, and what you learned from the outcome.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.