InterviewStack.io LogoInterviewStack.io

Data Architecture and Pipelines Questions

Designing data storage, integration, and processing architectures. Topics include relational and NoSQL database design, indexing and query optimization, replication and sharding strategies, data warehousing and dimensional modeling, ETL and ELT patterns, batch and streaming ingestion, processing frameworks, feature stores, archival and retention strategies, and trade offs for scale and latency in large data systems.

EasyTechnical
41 practiced
Design a relational table schema for e-commerce orders optimized for both transactional correctness and downstream analytics. Example schema:
sql
CREATE TABLE orders (
  order_id BIGINT PRIMARY KEY,
  user_id BIGINT,
  created_at TIMESTAMP,
  status VARCHAR(32),
  total_amount DECIMAL(12,2),
  currency CHAR(3),
  items JSONB,
  updated_at TIMESTAMP
);
Explain choices for primary key, partitioning strategy, recommended indexes, handling soft deletes, and how you would tune this for high insert rates and analytical queries.
EasyTechnical
54 practiced
Explain the differences between OLTP and OLAP workloads in terms of data models, query patterns, latency, throughput, consistency, storage formats, and typical resource profiles. For a SaaS product that needs both transactional integrity and analytics, describe infrastructure approaches (separate systems with ETL/CDC, HTAP, or hybrid replication) and trade-offs. Give examples of when you'd choose each approach.
MediumTechnical
59 practiced
You must merge two transactional systems into a unified reporting model but encounter conflicting keys and duplicate records for customers. Design a deduplication and reconciliation approach: include record linkage strategies, confidence scoring, golden record selection, auditability, and how to represent uncertain merges for analysts.
MediumTechnical
56 practiced
You need to migrate on-prem ETL jobs to the cloud with minimal downtime and data loss. Outline a migration plan covering discovery, dual-write or dual-read phases, data validation checks, backfills, canary runs, cutover strategy, rollback criteria, and stakeholder communication. Highlight risk mitigation for each phase.
MediumTechnical
83 practiced
Design a dimensional model for retail sales analytics: identify facts and dimensions, choose grain for the sales fact, define primary/foreign keys, and propose how to implement Slowly Changing Dimensions (SCD), specifically Type 2. Sketch a high-level schema for sales_fact, product_dim, customer_dim, and describe how promotions should be modeled.

Unlock Full Question Bank

Get access to hundreds of Data Architecture and Pipelines interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.