InterviewStack.io LogoInterviewStack.io

Data Modeling and Architecture Questions

Design and modeling principles for transactional and analytical data systems. Topics include entity relationship modeling, normalization and denormalization trade offs, dimensional modeling with fact and dimension tables and star and snowflake schemata, indexing strategies, partitioning and sharding, and schema design for performance and maintainability. Cover data pipelines and integration patterns including extract transform load and extract load transform approaches, data warehousing and data lake concepts, ETL orchestration, and how sources feed into reporting and business intelligence systems. Also include considerations for data quality, governance, and the differences between online transaction processing and online analytical processing workloads.

HardSystem Design
0 practiced
Design an event-sourced data lakehouse where all source system changes are ingested as immutable events, and analytics tables are derived views (snapshots) over events. Discuss how you'd implement time-travel (point-in-time queries), handle schema evolution, snapshot frequency, and reconcile events with late-arriving corrections.
MediumTechnical
0 practiced
You need to design partitioning and compaction strategy for time-series sensor data arriving at 1 billion rows per day into a Delta Lake on cloud storage. Discuss partition key choices, file-size targets, small-files mitigation, and retention/archival. Explain how partitioning affects query pruning and write/read performance.
MediumSystem Design
0 practiced
Design a star schema for an e-commerce analytics use case: model orders, order_items, customers, products, and a time dimension. Define the grain of the fact table, column choices for each dimension (surrogate keys vs natural keys), and which attributes you would denormalize into dimensions to simplify reporting.
MediumTechnical
0 practiced
SQL task: Given an events table:
sql
events(event_id STRING, user_id STRING, event_time TIMESTAMP)
Write a SQL query (ANSI or BigQuery-style) that computes the 6-month rolling monthly active users (MAU) per month (i.e., for each month M, count distinct users active in months M and the previous 5 months). Show output columns: month_start, mau_6mo.
MediumTechnical
0 practiced
Write a SQL-based SCD Type 2 upsert pattern (ANSI SQL or PostgreSQL) to update a Customer dimension table. Given source rows with customer_id and new attributes, describe how you'd insert a new version for changed rows and expire the previous record (using effective_from/effective_to and current_flag). Provide the SQL logic or pseudocode and mention concurrency considerations.

Unlock Full Question Bank

Get access to hundreds of Data Modeling and Architecture interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.