InterviewStack.io LogoInterviewStack.io

Cloud Data Warehouse Design and Optimization Questions

Covers design and optimization of analytical systems and data warehouses on cloud platforms. Topics include schema design patterns for analytics such as star schema and snowflake schema, purposeful denormalization for query performance, column oriented storage characteristics, distribution and sort key selection, partitioning and clustering strategies, incremental loading patterns, handling slowly changing dimensions, time series data modeling, cost and performance trade offs in cloud managed warehouses, and platform specific features that affect query performance and storage layout. Candidates should be able to discuss end to end design considerations for large scale analytic workloads and trade offs between latency, cost, and maintainability.

MediumTechnical
65 practiced
Explain how table statistics (histograms, column cardinality, row counts) affect query planning in cloud data warehouses. Describe commands or processes (ANALYZE, COMPUTE STATISTICS) on platforms like Redshift, Snowflake, or BigQuery and why up-to-date stats are important.
EasyTechnical
72 practiced
You have an events table partitioned by event_date: events(event_id STRING, user_id STRING, event_type STRING, event_time TIMESTAMP, event_date DATE). Write a SQL query (specify dialect: BigQuery or Snowflake standard SQL) to compute daily active users for the last 7 days that minimizes scanned data by leveraging partition pruning and any available clustering.
MediumSystem Design
71 practiced
Design a star-schema for time-series metrics ingested at 100M events/day. The main queries are by time range (hour/day), region, and metric type. Propose table schemas and explain partitioning and clustering strategy for BigQuery or Snowflake to support efficient range scans and aggregations.
HardTechnical
58 practiced
Implement an incremental merge pattern using Spark Structured Streaming and Delta Lake (pseudocode acceptable). Given schema: events(id STRING, user_id STRING, event_time TIMESTAMP, value DOUBLE). Describe how you deduplicate, handle late-arriving events, and maintain exactly-once semantics for upserts into a Delta table.
EasyTechnical
73 practiced
You are given an SLA that a critical dashboard must reflect data no older than 15 minutes. Describe architecture choices to meet this data freshness requirement in a cloud data warehouse: batch frequency, micro-batches, streaming ingestion, storage formats, compute provisioning, and monitoring considerations.

Unlock Full Question Bank

Get access to hundreds of Cloud Data Warehouse Design and Optimization interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.