InterviewStack.io LogoInterviewStack.io

Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardTechnical
45 practiced
Write pseudocode (Java or Scala) for a custom watermark generator in Flink that adapts watermark delay dynamically based on observed event lateness distribution. The goal is to minimize lateness while keeping late-arriving events under 1% of total. Describe how you compute percentile-based delays and how you handle sudden shifts in lateness.
MediumTechnical
45 practiced
Sketch how to implement a stream-to-stream join between orders and payments using Spark Structured Streaming (Python or Scala). The join window is 5 minutes; describe watermarking, state retention, configuration options, and include a short code outline showing how you'd set up the streaming query.
MediumTechnical
45 practiced
List the critical operational metrics for a production streaming pipeline spanning brokers, producers, consumers, and processing jobs. For each component give example metric names (e.g., consumer lag, broker under-replicated-partitions), suggested alert thresholds, and a short alerting policy (severity and initial remediation steps).
HardTechnical
39 practiced
Design a memory-efficient deduplication system for very high throughput streams using probabilistic data structures like Bloom filters or counting Bloom filters. Quantify the memory versus false-positive rate trade-offs, explain how to handle expirations and sliding windows, and discuss how to balance occasional false positives against correctness requirements.
HardSystem Design
42 practiced
Design a real-time pricing engine using event sourcing and CQRS. Price changes are events that must produce materialized views for fast reads, support auditability, and allow replay. Choose an event store, explain compaction and snapshot strategies, how to build projections, and how to handle schema changes and projection rebuilds.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.