InterviewStack.io LogoInterviewStack.io

Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardTechnical
0 practiced
Describe an end-to-end exactly-once pipeline: producers -> Kafka -> Flink -> external sink (example: PostgreSQL or S3). Explain how Kafka transactions, Flink checkpointing, and two-phase commit sinks or idempotent writes can be combined to achieve E2E exactly-once. Enumerate failure modes and how your design handles them.
HardSystem Design
0 practiced
Design a backfill and reprocessing approach to correct analytics after fixing a processing bug. You need to reprocess 30 days of historical events without affecting live traffic and avoid double-counting. Describe strategies such as dual-run, replaying topics, versioned materialized views, idempotent sinks, and runbook steps to validate correctness.
MediumTechnical
0 practiced
Provide pseudocode (Python or Java) and a design description for an idempotent upsert sink to PostgreSQL from a streaming job. The sink must handle retries, out-of-order attempts, and partial failures without producing duplicates. Explain transactional patterns, unique keys, and performance considerations.
HardSystem Design
0 practiced
Design partitioning and shuffle strategies for joining a high-cardinality user stream with a high-rate event stream in Flink while avoiding repartitioning hotspots. Discuss composite keys, salted hashing, pre-aggregation, fan-out trade-offs, and when to use an external sharded KV store versus keyed state.
HardTechnical
0 practiced
Design a high-throughput exactly-once-compatible sink to S3 that writes Parquet or Avro files from a streaming job. Address small-file problems, idempotence, checkpoint coordination with the processing engine, partitioning schemes for downstream analytics, and post-write compaction strategies. Explain how you ensure downstream consumers read consistent snapshots.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.