InterviewStack.io LogoInterviewStack.io

Data Warehouse and Dimensional Modeling Questions

Design and model scalable analytical data systems using dimensional modeling principles and data warehouse architecture patterns. Core concepts include fact and dimension tables, defining and enforcing grain, surrogate keys, degenerate and role playing dimensions, conformed dimensions, and handling slowly changing dimensions including Type One, Type Two, and Type Three. Understand schema choices and trade offs such as star schema versus snowflake schema, normalization versus denormalization, and fact table types including transactional, periodic snapshot, and accumulating snapshot. Apply design decisions to meet query patterns and performance goals by considering partitioning, indexing, compression, columnar storage, and aggregation strategies. Be able to design schemas for different business domains, reason about data integration and consistency, and optimize for common analytical workloads and reporting requirements.

MediumTechnical
0 practiced
You are modeling analytics for an online learning platform. You have events: course_enrollment (one-per-enrollment), daily_progress (percentage completed per user per day), and course_lifecycle (multiple milestones per course from enrollment to certification). For the following reporting needs specify which fact-table type is appropriate and justify: 1) Track each enrollment event. 2) Measure daily progress and retention. 3) Measure time between enrollment and certification for each learner.
HardTechnical
0 practiced
Your company has separate data warehouses for marketing and finance with different versions of the customer dimension. As a staff data analyst, propose a governance model to version, publish and enforce a conformed customer dimension across both warehouses. Include the data-contract, canonical keys, ETL pipelines, validation tests, rollout plan and how to minimize disruption to downstream consumers.
MediumTechnical
0 practiced
Event producers may retry sending events, causing duplicate rows in the fact table. Propose a deduplication strategy for ETL/ELT: define idempotent ingestion using a dedup key (e.g., event_id), hashing payloads, windowed dedup logic, staging dedup tables vs dedup at insert, and how this interacts with partitioning/backfills and performance.
HardSystem Design
0 practiced
Design a product-dimension strategy for an online marketplace with tens of millions of SKUs and very sparse attributes that vary by category. Propose and compare solutions: vertical partitioning (category-specific attribute tables), attribute store (EAV), NoSQL attribute store, using surrogate keys with late-binding attributes, or mini-dimensions. Discuss query patterns, join costs, and BI usability for each approach.
EasyTechnical
0 practiced
Compare a star schema and a snowflake schema. Describe the physical differences, query and maintenance trade-offs, and provide an example scenario where snowflaking a dimension (normalizing it) could be beneficial, and a scenario where a denormalized star is preferred for performance.

Unlock Full Question Bank

Get access to hundreds of Data Warehouse and Dimensional Modeling interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.