InterviewStack.io LogoInterviewStack.io

Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardTechnical
0 practiced
Implement a deduplication approach for streaming events where event ids may collide and some events lack ids. Design an approach that supports high throughput with bounded memory: describe using Bloom filters or approximate distinct counters, a windowed dedupe with TTL, handling of false positives, and the business trade-offs between occasional false dedupe vs duplicates in analytics.
HardSystem Design
0 practiced
Your BI dashboards require historical drill-down to raw events for audit, but storing raw events at full scale is expensive. Propose a hybrid architecture that supports auditability (fast retrieval for sampled events), compact storage for long term, and efficient reprocessing for corrections (schema changes or bug fixes). Include data lifecycle, indexing, cold storage, and retrieval patterns.
MediumTechnical
0 practiced
A finance team requires strict correctness for revenue aggregates that are streamed into Snowflake via Kafka Connect. Describe strategies to ensure idempotent writes or exactly-once-like behavior to Snowflake. Include approaches for deduplication, upserts, transactional staging tables, and how to safely perform replays or backfills without double-counting.
HardTechnical
0 practiced
Design a streaming join between two high-cardinality streams (orders and clicks) to compute real-time attribution. Each stream is partitioned differently. Explain partitioning strategies (co-partitioning, repartitioning), re-partitioning costs, buffering and state size considerations, windowing and join semantics, and approaches to handle unmatched late events.
HardTechnical
0 practiced
Describe a real or hypothetical incident where stream processing produced incorrect analytics (for example duplicate counts, missing events, or late-arriving events causing wrong totals). As the BI analyst, outline root cause analysis steps, remediation actions including backfills, stakeholder communication, and changes you would implement to prevent recurrence.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.