Data Warehouse and Dimensional Modeling Questions

Design and model scalable analytical data systems using dimensional modeling principles and data warehouse architecture patterns. Core concepts include fact and dimension tables, defining and enforcing grain, surrogate keys, degenerate and role playing dimensions, conformed dimensions, and handling slowly changing dimensions including Type One, Type Two, and Type Three. Understand schema choices and trade offs such as star schema versus snowflake schema, normalization versus denormalization, and fact table types including transactional, periodic snapshot, and accumulating snapshot. Apply design decisions to meet query patterns and performance goals by considering partitioning, indexing, compression, columnar storage, and aggregation strategies. Be able to design schemas for different business domains, reason about data integration and consistency, and optimize for common analytical workloads and reporting requirements.

EasyTechnical

0 practiced

Explain the concept of 'grain' when designing a fact table. Provide at least three examples of poorly chosen grain and the problems those choices cause (e.g., aggregation errors, duplicate counting). Describe how you would document and enforce grain in ETL pipelines so downstream analysts don't misinterpret measures.

HardTechnical

0 practiced

You operate a multi-tenant analytic warehouse where tenant A has 90% of the rows and others are small. Query patterns are mainly tenant-scoped, with occasional cross-tenant analytics. Propose a partitioning and sharding strategy to avoid hotspots and provide tenant isolation. Compare tenant-id partitioning, hash sharding, and using separate schemas/projects per tenant; discuss impact on joins, backups, and resource isolation.

MediumTechnical

0 practiced

Design a date dimension schema suitable for both analytics and ML feature engineering. List attributes to include (e.g., date_key, date, year, fiscal flags, is_holiday, business_day, week_of_year, working_days_since, rolling flags), discuss cardinalities and data types, and describe how you would generate and maintain multiple calendars (regional holidays, fiscal calendars).

HardTechnical

0 practiced

Design an accumulating snapshot schema to monitor long-running workflows (e.g., loan applications) where each entity may have varying numbers of intermediate events. The schema should support SLA compliance queries and pipeline debugging. Explain how to store event timelines, choose indexes, and provide example queries to compute percentiles (P50/P95) of stage durations and detect stuck cases.

HardSystem Design

0 practiced

Design a system to capture row-level provenance and lineage for every record flowing through ETL into the warehouse so data scientists can reproduce training datasets exactly. Describe metadata capture, storage design for lineage (e.g., graph DB or tables), APIs for tracing dependencies, snapshotting strategies for reproducibility, and how to make lineage queries performant.

Unlock Full Question Bank

Get access to hundreds of Data Warehouse and Dimensional Modeling interview questions and detailed answers.

Join thousands of developers preparing for their dream job.