InterviewStack.io LogoInterviewStack.io

Data Modeling and Schema Design Questions

Focuses on designing efficient, maintainable data schemas for transactional and analytical systems. Candidates should demonstrate understanding of normalization principles and normal forms, when and why to denormalize for performance, and schema design patterns for different use cases. Expect dimensional modeling topics including fact and dimension tables, star and snowflake schemas, grain definition, slowly changing dimensions, and strategies for handling historical data. The topic also includes trade offs between online transaction processing and online analytical processing designs, query performance considerations, indexing and partitioning strategies, and the ability to evaluate and improve existing schemas to meet business requirements and scale.

MediumTechnical
0 practiced
A data warehouse team asks you whether to use surrogate integer keys or natural keys for dimension tables. Discuss pros and cons and your recommendation for large-scale analytics (hundreds of millions of rows).
MediumTechnical
1 practiced
Design a dimension table for 'product' to be used in a data warehouse where products can change category and price frequently. Explain how you would handle SCD for category and price, and what columns you'd include to support effective-dated queries.
HardTechnical
1 practiced
An analytical query scans a partitioned fact table but isn't benefiting from partition pruning. Given the query and partitioning scheme below, identify why pruning fails and propose fixes.
Partitioning: orders partitioned by RANGE(order_date) monthlyQuery: SELECT product_id, SUM(amount) FROM orders WHERE order_date >= '2023-01-15' AND order_date < '2023-02-10' GROUP BY product_id;
Assume order_date is stored as a string in 'YYYY-MM-DD' format.
EasyTechnical
1 practiced
What is database normalization aimed to prevent? List three common anomalies that normalization addresses and give a short example of each.
HardSystem Design
0 practiced
You are asked to design a schema for a real-time analytics dashboard that needs near-real-time metrics (within seconds) and supports ad-hoc drilldowns. Outline a hybrid architecture and schema choices to meet low-latency ingestion and flexible querying.

Unlock Full Question Bank

Get access to hundreds of Data Modeling and Schema Design interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.