InterviewStack.io LogoInterviewStack.io

Real Time and Batch Ingestion Questions

Focuses on choosing between batch ingestion and real time streaming for moving data from sources to storage and downstream systems. Topics include latency and throughput requirements, cost and operational complexity, consistency and delivery semantics such as at least once and exactly once, idempotent and deduplication strategies, schema evolution, connector and source considerations, backpressure and buffering, checkpointing and state management, and tooling choices for streaming and batch. Candidates should be able to design hybrid architectures that combine streaming for low latency needs with batch pipelines for large backfills or heavy aggregations and explain operational trade offs such as monitoring, scaling, failure recovery, and debugging.

EasyTechnical
0 practiced
A dashboard shows a sudden spike in metrics at 02:00 AM. Outline an incident triage checklist for immediate investigation to determine whether the spike is real business activity or a pipeline issue. Include queries, logs, and quick mitigations to prevent bad decisions.
MediumTechnical
0 practiced
Discuss the major cost drivers for real-time ingestion (network egress, compute for stream processing, storage for hot data, connector licensing and operational overhead). For a small startup with limited budget, propose a low-cost architecture that still supports useful analytics.
HardTechnical
0 practiced
You must reprocess 6 months of transformed event data (~100TB) because of a bug in the transformation logic that affected downstream BI metrics. Describe a safe, cost-conscious reprocessing plan: staging, parallelization, validation, minimizing user impact, and final swap to corrected datasets.
EasyTechnical
0 practiced
Explain event-time vs processing-time and the role of watermarks in streaming systems. Why do these concepts matter for BI metrics when events can arrive late, and how do watermarks affect windowed aggregations?
MediumTechnical
0 practiced
An upstream CDC connector started producing duplicate events for some transactions and the BI dashboards show inflated totals. Describe immediate mitigation steps to protect dashboards and a long-term plan to fix root causes, including verification and monitoring changes.

Unlock Full Question Bank

Get access to hundreds of Real Time and Batch Ingestion interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.