Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

MediumTechnical

0 practiced

Write a SQL streaming query (using Flink SQL or Spark Structured Streaming SQL) that computes a 10-minute sliding-window average revenue per product with a 1-minute slide. Provide the SQL and explain how you'd configure handling of late events with an allowed-lateness of 5 minutes and how updates to previously emitted windows would be surfaced to BI dashboards.

EasyTechnical

0 practiced

Define event time, processing time, and ingestion time. Give a BI example where choosing the wrong time semantic (e.g., using processing time instead of event time) leads to an incorrect KPI, and explain how you would correct the metric implementation.

MediumSystem Design

0 practiced

Design a streaming pipeline to power an executive dashboard that must show near-real-time daily active users (DAU) with maximum staleness of 30 seconds. Requirements: 1) handle peak ingest of 1M events/sec, 2) support late events up to 10 minutes, 3) produce hourly and daily aggregates for historical queries. Describe ingestion, broker configuration, processing engine, state store, partitioning strategy, sink choices, and how results will be exposed to BI tools like Tableau.

MediumTechnical

0 practiced

A finance team requires strict correctness for revenue aggregates that are streamed into Snowflake via Kafka Connect. Describe strategies to ensure idempotent writes or exactly-once-like behavior to Snowflake. Include approaches for deduplication, upserts, transactional staging tables, and how to safely perform replays or backfills without double-counting.

MediumTechnical

0 practiced

You observe a sudden spike in consumer lag for a critical Kafka topic that feeds dashboards. As the BI analyst responsible for metrics, outline the operational troubleshooting steps you would take, including which metrics and tools you would check, immediate mitigations, and how you would communicate status and impact to stakeholders.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Join thousands of developers preparing for their dream job.