InterviewStack.io LogoInterviewStack.io

Batch and Stream Processing Questions

Covers design and implementation of data processing using batch, stream, or hybrid approaches. Candidates should be able to explain when to choose batch versus streaming based on latency, throughput, cost, data volume, and business requirements, and compare architectural patterns such as lambda and kappa. Core stream concepts include event time versus processing time, windowing strategies such as tumbling sliding and session windows, watermarks and late arrivals, event ordering and out of order data handling, stateful versus stateless processing, state management and checkpointing, and delivery semantics including exactly once and at least once. Also includes knowledge of streaming and batch engines and runtimes, connector patterns for sources and sinks, partitioning and scaling strategies, backpressure and flow control, idempotency and deduplication techniques, testing and replayability, monitoring and alerting, and integration with storage layers such as data lakes and data warehouses. Interview focus is on reasoning about correctness latency cost and operational complexity and on concrete architecture and tooling choices.

MediumTechnical
66 practiced
Compare connector patterns for sources and sinks in streaming architectures: push vs pull, connector-managed offsets vs broker-managed offsets, CDC connectors, file-based sinks, and direct DB writes. For each pattern, discuss throughput, latency, and consistency trade-offs.
MediumTechnical
77 practiced
How would you handle schema evolution in streaming data pipelines where Avro or Protobuf schemas change frequently? Outline a governance approach, tooling (schema registry), compatibility rules, and practical steps to roll out non-breaking and breaking changes.
HardTechnical
123 practiced
Design an approach to provide end-to-end exactly-once semantics when one of the sinks is non-transactional (for example, a third-party REST API). Discuss idempotency keys, dedup stores, write buffering, and compensating transactions to achieve acceptable guarantees.
EasyTechnical
88 practiced
Describe backpressure and flow control in streaming systems. What are common techniques to handle backpressure from slow sinks (e.g., buffering, rate limiting, reactive pull)? What are trade-offs with memory usage and latency?
HardBehavioral
75 practiced
Tell me about a time when you, as a solutions architect, had to reconcile competing priorities between sales (who promised 'real-time analytics') and engineering (who argued for batch due to cost). Describe your approach, how you evaluated trade-offs, and the outcome.

Unlock Full Question Bank

Get access to hundreds of Batch and Stream Processing interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.