InterviewStack.io LogoInterviewStack.io

Real Time and Batch Ingestion Questions

Focuses on choosing between batch ingestion and real time streaming for moving data from sources to storage and downstream systems. Topics include latency and throughput requirements, cost and operational complexity, consistency and delivery semantics such as at least once and exactly once, idempotent and deduplication strategies, schema evolution, connector and source considerations, backpressure and buffering, checkpointing and state management, and tooling choices for streaming and batch. Candidates should be able to design hybrid architectures that combine streaming for low latency needs with batch pipelines for large backfills or heavy aggregations and explain operational trade offs such as monitoring, scaling, failure recovery, and debugging.

MediumSystem Design
74 practiced
Design a low-cost near-real-time analytics pipeline for product events at 10k events/min that writes hourly partitioned aggregates to BigQuery. Choose between Pub/Sub+Dataflow vs Kafka+Connectors and justify your choice in terms of cost, ops overhead, and performance.
HardSystem Design
74 practiced
Design a multi-region ingestion system for global events where data residency laws require certain regions' data to remain within their region, dashboards need low cross-region latency, and global aggregates are eventually reconciled. Describe replication, partitioning, and failover strategies.
MediumTechnical
102 practiced
Write a Postgres SQL query to compute daily churn rate from an events table events(user_id, event_type, occurred_at) where churn is defined as no 'session_start' events in the last 30 days. Describe how you would make this incremental for near-real-time use.
HardSystem Design
81 practiced
Design an end-to-end ingestion and aggregation pipeline that provides sub-minute KPI updates for 1M active users generating 100k events/sec. Include ingestion, stream processing, storage for hot and cold layers, idempotency strategies, schema evolution handling, monitoring, and disaster recovery approach.
EasyBehavioral
84 practiced
Tell me about a time you convinced stakeholders to accept eventual consistency for a dashboard metric to reduce cost or complexity. Use STAR: Situation, Task, Action, Result. Focus on arguments, data you presented, and the outcome.

Unlock Full Question Bank

Get access to hundreds of Real Time and Batch Ingestion interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.