Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardTechnical

0 practiced

Design a robust dead-letter handling strategy for a streaming pipeline that processes heterogeneous JSON events. How do you detect poison messages, route them to a DLQ, provide observability, and allow safe manual/automated replay after fixes? Consider at-least-once semantics, performance impact, and data retention policies.

MediumTechnical

0 practiced

Explain causes and symptoms of backpressure in a streaming pipeline (e.g., a Flink job experiencing slow sinks). Describe detection strategies and operational mitigations (increase parallelism, tune producer/consumer configs, apply rate limiting, use buffers). Include concrete Kafka and Flink configuration knobs you would consider.

HardSystem Design

0 practiced

Design a multi-tenant Flink deployment on Kubernetes that supports tenant isolation, fair resource allocation, and safe upgrade paths. Discuss architecture choices (cluster-per-tenant vs shared cluster), admission controls, scheduling (Kubernetes scheduler or custom extensions), and monitoring needed to detect noisy neighbors.

HardSystem Design

0 practiced

You discovered a bug in a processing job that affects computed aggregates for the past 3 days. Design a reprocessing architecture that allows you to recompute historical results and backfill dashboards without disrupting ongoing processing. Cover data replay strategies, idempotent sinks, versioned outputs, and safety controls.

HardTechnical

0 practiced

A Kafka cluster shows frequent leader elections and shrinking ISR, causing increased tail latency. Provide a root cause analysis checklist including broker metrics, OS-level signs, network issues, disk saturation, replica fetcher problems, and relevant configuration knobs. Define immediate mitigations and longer-term fixes.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Join thousands of developers preparing for their dream job.