Advanced Caching and Data Pipeline Design Questions
Distributed caching, cache coherency, specialized stores (search engines, column stores, time-series databases). Data pipeline architecture: batch processing, stream processing, ETL design. Understanding Lambda and Kappa architectures.
MediumTechnical
45 practiced
Implement a simple event deduplicator in Python for a high-throughput stream where each event has an id and timestamp. The deduplicator should drop duplicates within a sliding time window (e.g., 5 minutes) and be memory efficient. Describe limitations of your in-memory design and how to scale it using external state (Redis, RocksDB) or streaming state backends.
HardTechnical
48 practiced
Discuss cache coherence models for distributed caches: invalidation-based, update-based, lease-based, and version-vector approaches. For each model, analyze scaling behavior, network overhead, stale-read windows, and suitability for read-heavy vs write-heavy systems.
MediumTechnical
42 practiced
Implement consistent hashing ring node selection in Python to map keys to cache nodes. Include the ability to add and remove nodes with minimal remapping. Do not use external libraries; provide the core hashing and lookup logic and explain how virtual nodes affect balance.
MediumTechnical
64 practiced
Given three storage options—Elasticsearch (search engine), ClickHouse/BigQuery (column-store), and TimescaleDB/InfluxDB (time-series)—describe concrete criteria for choosing each for a feature that stores and queries telemetry metrics and logs. Include query patterns, aggregation needs, retention, cost, and caching implications.
HardTechnical
46 practiced
Sketch an exactly-once sink for streaming processing that writes aggregated results to a SQL database. Explain how you would use transactional writes, idempotent upserts, or Kafka transactions to ensure no duplicates or missing updates in the face of retries and failures. Include schema design and unique keys used to achieve idempotency.
Unlock Full Question Bank
Get access to hundreds of Advanced Caching and Data Pipeline Design interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.