InterviewStack.io LogoInterviewStack.io

Batch and Stream Processing Questions

Covers design and implementation of data processing using batch, stream, or hybrid approaches. Candidates should be able to explain when to choose batch versus streaming based on latency, throughput, cost, data volume, and business requirements, and compare architectural patterns such as lambda and kappa. Core stream concepts include event time versus processing time, windowing strategies such as tumbling sliding and session windows, watermarks and late arrivals, event ordering and out of order data handling, stateful versus stateless processing, state management and checkpointing, and delivery semantics including exactly once and at least once. Also includes knowledge of streaming and batch engines and runtimes, connector patterns for sources and sinks, partitioning and scaling strategies, backpressure and flow control, idempotency and deduplication techniques, testing and replayability, monitoring and alerting, and integration with storage layers such as data lakes and data warehouses. Interview focus is on reasoning about correctness latency cost and operational complexity and on concrete architecture and tooling choices.

EasyTechnical
70 practiced
Differentiate stateless and stateful stream processing with concrete examples (e.g., stateless: filtering, enrichment; stateful: sessionization, aggregations). Explain operational implications for scaling and failure recovery.
EasyTechnical
66 practiced
List and compare common windowing strategies used in stream processing (tumbling, sliding, session). For each, describe typical use cases, configuration considerations (size, gap), and how they affect state size and latency.
HardSystem Design
83 practiced
Design a cost-optimized streaming architecture on the cloud for IoT sensors at 10M devices with highly variable message rates. Explain choices for ingestion, message bus, processing (serverless vs cluster), storage tiering, and how you would trade latency vs cost for analytics and operational telemetry.
MediumTechnical
73 practiced
Given a source with network jitter and occasional outliers that can be up to 5 minutes late, explain a practical watermarking strategy and allowed-lateness configuration. How would you handle events that arrive later than allowed-lateness and still must be counted?
EasyTechnical
88 practiced
Describe backpressure and flow control in streaming systems. What are common techniques to handle backpressure from slow sinks (e.g., buffering, rate limiting, reactive pull)? What are trade-offs with memory usage and latency?

Unlock Full Question Bank

Get access to hundreds of Batch and Stream Processing interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.