InterviewStack.io LogoInterviewStack.io

Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardTechnical
0 practiced
Design an automated fault-injection test suite to validate end-to-end exactly-once processing for a streaming pipeline. Include scenarios such as broker failure, incomplete transactions, consumer crash during checkpoint, network partition, and object-store checkpoint failure. Describe test harness, assertions to prove no duplication or loss, and how to integrate into CI/CD.
EasyTechnical
0 practiced
Define consumer lag in a streaming system and explain how it is measured (e.g., latest broker offset minus consumer committed offset). Which metrics and thresholds would you monitor for production alerts, and what automated or manual remediation steps would you configure when lag grows unexpectedly?
MediumTechnical
0 practiced
Compare checkpointing and fault recovery mechanisms between Apache Flink and Spark Structured Streaming. Explain how Flink's incremental checkpointing and RocksDB state backend differ from Spark's micro-batch and WAL approaches, and discuss trade-offs in checkpoint frequency, recovery time, and storage backend choices (S3, HDFS).
HardSystem Design
0 practiced
Design an architecture to propagate backpressure from a slow downstream sink back through a distributed streaming pipeline to producers. Include mechanisms such as reactive-streams backpressure, bounded queues with rejection, rate-limiting tokens, and circuit-breakers. Explain how to avoid cascading failures and ensure graceful degradation.
MediumTechnical
0 practiced
You use Avro plus a Confluent Schema Registry. Explain backward, forward, and full compatibility modes. For each, show an example change (add field with default, remove field, rename field) and whether it would be allowed. Propose a rollout strategy for schema changes across producers and consumers in a large organization.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.