InterviewStack.io LogoInterviewStack.io

Data Architecture and Pipelines Questions

Designing data storage, integration, and processing architectures. Topics include relational and NoSQL database design, indexing and query optimization, replication and sharding strategies, data warehousing and dimensional modeling, ETL and ELT patterns, batch and streaming ingestion, processing frameworks, feature stores, archival and retention strategies, and trade offs for scale and latency in large data systems.

MediumTechnical
50 practiced
Discuss the trade-offs of adopting a lakehouse architecture (e.g., Delta Lake) versus maintaining separate data lake and data warehouse systems for a mid-sized analytics and ML platform. Cover ACID guarantees, schema enforcement, query performance, cost model, and engineering operational complexity.
EasyTechnical
43 practiced
Explain schema-on-read versus schema-on-write. For feature engineering pipelines in a data lake used by data scientists, discuss the advantages and drawbacks of each approach and give examples of when one is preferable over the other.
HardTechnical
57 practiced
You must implement exactly-once semantics for a streaming aggregation pipeline that computes features and writes to an online store. Describe how you would achieve exactly-once with Apache Flink or Kafka Streams, and detail strategies for sinks that are not idempotent (e.g., external databases).
MediumTechnical
52 practiced
Explain how Change Data Capture (CDC) combined with event sourcing can be used to reconstruct historical feature values at any point-in-time for model backtesting. Discuss storage formats, performance implications, and the trade-offs between storing raw events vs periodically materialized snapshots.
HardTechnical
54 practiced
A data lake has millions of small Parquet files created by many upstream jobs, causing expensive metadata and slow queries. Describe compaction strategies: batch compaction scheduling, target file sizing heuristics, safe in-place compaction vs atomic swap patterns, and metrics you would track to measure success without impacting ongoing reads and writes.

Unlock Full Question Bank

Get access to hundreds of Data Architecture and Pipelines interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.