Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

MediumTechnical

0 practiced

List the critical operational metrics for a production streaming pipeline spanning brokers, producers, consumers, and processing jobs. For each component give example metric names (e.g., consumer lag, broker under-replicated-partitions), suggested alert thresholds, and a short alerting policy (severity and initial remediation steps).

EasyTechnical

0 practiced

Implement a memory-bounded deduplicator in Python for a stream of events where each event has a unique `event_id` and a timestamp. Provide the API: `def is_duplicate(event_id: str, ts: int) -> bool`. Deduplicate events within a sliding 1-hour window. Explain trade-offs and how your solution bounds memory usage.

EasyTechnical

0 practiced

You must pick a partition key for an orders topic that contains `order_id`, `user_id`, and `product_id`. Explain the criteria that should guide your choice of partition key with respect to per-entity ordering, fairness of load distribution, join locality, and hot keys. Give example decisions for use cases like per-user activity streams and inventory updates.

MediumSystem Design

0 practiced

Design an approach to enrich a high-rate orders stream with a slowly changing product catalog (updates every few hours). Compare options: caching the catalog in the stream job (stateful) versus joining with a changelog compacted Kafka topic (stream-table join). Discuss consistency, propagation of catalog corrections, and memory implications.

EasyTechnical

0 practiced

Explain Kafka log compaction and time-based retention. When would you enable log compaction for a topic, what semantics does it provide for consumers, and what are implications for storage, tombstones, and recovery of latest state from a compacted topic?

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Join thousands of developers preparing for their dream job.