Stream Processing and Event Streaming Questions

Designing and operating systems that ingest, process, and serve continuous event streams with low latency and high throughput. Core areas include architecture patterns for stream native and event driven systems, trade offs between batch and streaming models, and event sourcing concepts. Candidates should demonstrate knowledge of messaging and ingestion layers, message brokers and commit log systems, partitioning and consumer group patterns, partition key selection, ordering guarantees, retention and compaction strategies, and deduplication techniques. Processing concerns include stream processing engines, state stores, stateful processing, checkpointing and fault recovery, processing guarantees such as at least once and exactly once semantics, idempotence, and time semantics including event time versus processing time, watermarks, windowing strategies, late and out of order event handling, and stream to stream and stream to table joins and aggregations over windows. Performance and operational topics cover partitioning and scaling strategies, backpressure and flow control, latency versus throughput trade offs, resource isolation, monitoring and alerting, testing strategies for streaming pipelines, schema evolution and compatibility, idempotent sinks, persistent storage choices for state and checkpoints, and operational metrics such as stream lag. Familiarity with concrete technologies and frameworks is expected when discussing designs and trade offs, for example Apache Kafka, Kafka Streams, Apache Flink, Spark Structured Streaming, Amazon Kinesis, and common serialization formats such as Avro, Protocol Buffers, and JSON.

HardSystem Design

0 practiced

Design a secure streaming pipeline that encrypts data at rest and in transit, masks or tokenizes PII in-flight, restricts access via topic ACLs, maintains audit logs for access and schema changes, and integrates with a KMS for key rotation. Detail choices for TLS, encryption at broker and storage, key management, schema enforcement, and how to operationalize audits for compliance.

MediumTechnical

0 practiced

Explain how you would configure checkpointing, state backend, and savepoints in Apache Flink for a mission-critical stateful pipeline. Cover choices between RocksDB and heap-based state, checkpoint intervals, incremental checkpoints, aligned vs unaligned checkpoints, and the operational procedure for operator upgrades and restoring from savepoints.

HardSystem Design

0 practiced

Design a hybrid architecture that blends streaming for freshness with batch processing for completeness. Explain how to route data to both paths, deduplicate across batch and stream, reconcile results, orchestrate backfills with live processing, and provide a consistent serving layer (materialized views) for analytics.

MediumTechnical

0 practiced

Design a monitoring and alerting plan for a production streaming platform. Include key metrics (consumer lag per partition, processing latency percentiles, checkpoint durations, state size, GC pause duration), threshold-based alerts, dashboards, and ideas for automated remediation or runbooks. How would you correlate infrastructure metrics to business impact?

HardSystem Design

0 practiced

Design an approach to integrate a streaming system with a legacy, non-idempotent REST API so that the client experiences minimal duplicate side-effects and high availability. Evaluate patterns such as transactional outbox + CDC, local buffering with dedupe keys, compensating transactions, and distributed locks. Describe monitoring and failure-handling for each pattern.

Unlock Full Question Bank

Get access to hundreds of Stream Processing and Event Streaming interview questions and detailed answers.

Join thousands of developers preparing for their dream job.