InterviewStack.io LogoInterviewStack.io

Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardTechnical
0 practiced
Given a stream of financial trades with schema (trade_id STRING, symbol STRING, quantity INT, price DOUBLE, trade_time TIMESTAMP), design a streaming job to compute VWAP (volume-weighted average price) per symbol with sub-second latency and fault tolerance. Explain how to handle out-of-order/late trades, duplicates, and exactly-once writes to downstream storage.
HardTechnical
0 practiced
Propose a strategy to test and validate end-to-end exactly-once processing for a pipeline that writes to PostgreSQL through an idempotent API. Include how you would simulate failures (task crash, network partition), replay inputs, and verify that no duplicates are written when producers may send duplicate messages.
MediumTechnical
0 practiced
Explain 'at-least-once' vs 'exactly-once' processing semantics in streaming systems. For which parts of an analytics pipeline does each semantic matter most? Describe practical techniques to get effectively exactly-once metrics using idempotent sinks, deduplication, or transactional writes with Kafka + Spark or Flink.
HardTechnical
0 practiced
Discuss the trade-offs between Kafka Streams and Apache Flink for complex event processing where application state grows to terabytes. Consider state checkpointing/restore times, operator scaling, latency, operational complexity, and vendor/managed support implications for analytics teams.
HardSystem Design
0 practiced
Design a storage strategy for large stateful operators where state grows into terabytes and must support point-in-time restores for regulatory compliance. Compare using RocksDB with remote incremental checkpointing vs an external persistent store (Cassandra/Scylla/HBase) for operator state, considering latency, restore time, operational complexity, and compliance needs.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.