InterviewStack.io LogoInterviewStack.io

Data Modeling and Schema Design Questions

Focuses on designing efficient, maintainable data schemas for transactional and analytical systems. Candidates should demonstrate understanding of normalization principles and normal forms, when and why to denormalize for performance, and schema design patterns for different use cases. Expect dimensional modeling topics including fact and dimension tables, star and snowflake schemas, grain definition, slowly changing dimensions, and strategies for handling historical data. The topic also includes trade offs between online transaction processing and online analytical processing designs, query performance considerations, indexing and partitioning strategies, and the ability to evaluate and improve existing schemas to meet business requirements and scale.

EasyTechnical
0 practiced
Explain database normalization and the first three normal forms (1NF, 2NF, 3NF). Use the following denormalized orders table as an example:
orders(order_id, order_date, customer_name, customer_address, product_id, product_name, quantity)
Show how you would transform this into 3NF, describe the functional dependencies you used, and explain why each step satisfies the normal forms.
EasyTechnical
0 practiced
Explain the difference between star and snowflake schemas. Describe a scenario where snowflaking a dimension (normalizing it) is beneficial and another scenario where a denormalized star schema is preferable.
MediumTechnical
0 practiced
Your product events schema evolves frequently (fields added, types changed). Propose a schema evolution strategy for production pipelines that minimizes breaking changes for downstream consumers. Include schema versioning, compatibility rules, migration patterns, and testing approaches.
MediumSystem Design
0 practiced
Design a schema for high-frequency IoT telemetry: 10,000 sensors producing 500 readings/second each. Requirements: efficient ingestion (append-heavy), time-window aggregations (1min, 1hr), retention policy (30 days hot, 2 years cold), and ability to join with device metadata. Describe table layout, partitioning, indexing/clustering, compression, and trade-offs.
HardTechnical
0 practiced
Given a high-volume user_events table with columns:
event_id STRING, user_id STRING, event_type STRING, event_time TIMESTAMP, properties JSON, session_id STRING, platform STRING
Propose an optimized schema and partitioning/clustering keys for Redshift/Snowflake/BigQuery to support sessionization, funnel analysis, and daily retention. Provide example DDL (choose one engine) and explain how your choices affect query cost and performance.

Unlock Full Question Bank

Get access to hundreds of Data Modeling and Schema Design interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.