Real Time and Batch Ingestion Questions

Focuses on choosing between batch ingestion and real time streaming for moving data from sources to storage and downstream systems. Topics include latency and throughput requirements, cost and operational complexity, consistency and delivery semantics such as at least once and exactly once, idempotent and deduplication strategies, schema evolution, connector and source considerations, backpressure and buffering, checkpointing and state management, and tooling choices for streaming and batch. Candidates should be able to design hybrid architectures that combine streaming for low latency needs with batch pipelines for large backfills or heavy aggregations and explain operational trade offs such as monitoring, scaling, failure recovery, and debugging.

HardSystem Design

74 practiced

Design a backpressure handling strategy for a Kubernetes-deployed stream processor that consumes from Kafka and writes to a slow third-party API with variable latency. Explain autoscaling, buffering, throttling, circuit-breakers, persistent queues, and monitoring you would implement to prevent cascading failures.

HardSystem Design

92 practiced

Design a hybrid pipeline that supports both online low-latency scoring and offline complex aggregations for retraining models. Explain how you would keep features consistent across online and offline stores with respect to freshness, lineage, and ordering. Provide strategies for reconciliation and versioning.

MediumTechnical

138 practiced

Explain event-time processing and watermarking: how watermarks work, how to set allowed lateness, and the implications for correctness and latency of ML features computed with time windows when late events arrive.

MediumTechnical

78 practiced

How would you handle schema evolution when ingesting Avro records from a stream into Parquet files in a data lake? Describe the role of schema registry, default values, nullable fields, partitioning, and strategies to backfill older partitions after a schema change.

MediumTechnical

81 practiced

Describe how you would perform capacity planning for a streaming ingestion pipeline that has predictable seasonal spikes (up to 10x). Cover traffic forecasting, autoscaling policies, state store sizing, cost implications, and how to test the plan before the season.

Unlock Full Question Bank

Get access to hundreds of Real Time and Batch Ingestion interview questions and detailed answers.

Join thousands of developers preparing for their dream job.