InterviewStack.io LogoInterviewStack.io

Batch and Stream Processing Questions

Covers design and implementation of data processing using batch, stream, or hybrid approaches. Candidates should be able to explain when to choose batch versus streaming based on latency, throughput, cost, data volume, and business requirements, and compare architectural patterns such as lambda and kappa. Core stream concepts include event time versus processing time, windowing strategies such as tumbling sliding and session windows, watermarks and late arrivals, event ordering and out of order data handling, stateful versus stateless processing, state management and checkpointing, and delivery semantics including exactly once and at least once. Also includes knowledge of streaming and batch engines and runtimes, connector patterns for sources and sinks, partitioning and scaling strategies, backpressure and flow control, idempotency and deduplication techniques, testing and replayability, monitoring and alerting, and integration with storage layers such as data lakes and data warehouses. Interview focus is on reasoning about correctness latency cost and operational complexity and on concrete architecture and tooling choices.

MediumTechnical
0 practiced
Explain strategies for testing and replayability of streaming pipelines. Include local unit testing, integration tests, controlled replay of historical events for production jobs, and how to verify stateful operators after replay.
EasyTechnical
0 practiced
Define event time and processing time in streaming systems. Provide an example where event time and processing time differ (e.g., mobile client offline then reconnects), explain the practical consequences for aggregation and joins, and how you would design to use event time correctly.
HardBehavioral
0 practiced
Tell me about a time when you, as a solutions architect, had to reconcile competing priorities between sales (who promised 'real-time analytics') and engineering (who argued for batch due to cost). Describe your approach, how you evaluated trade-offs, and the outcome.
EasyTechnical
0 practiced
Explain what a watermark is in stream processing, how it's used to handle late-arriving data, and describe at least two strategies for dealing with late events (e.g., side outputs, retraction/updates, allowed-lateness). Provide an example policy for a source with occasional 10-minute delays.
EasyTechnical
0 practiced
Explain Change Data Capture (CDC): what it is, common uses (e.g., streaming OLTP changes to analytic stores), and pros/cons of CDC vs batch extracts. What properties of the source DB affect CDC design?

Unlock Full Question Bank

Get access to hundreds of Batch and Stream Processing interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.