💾

Database Engineering & Data Systems Topics

Database design patterns, optimization, scaling strategies, storage technologies, data warehousing, and operational database management. Covers database selection criteria, query optimization, replication strategies, distributed databases, backup and recovery, and performance tuning at database layer. Distinct from Systems Architecture (which addresses service-level distribution) and Data Science (which addresses analytical approaches).

Cloud Data Warehouse Design and Optimization

Covers design and optimization of analytical systems and data warehouses on cloud platforms. Topics include schema design patterns for analytics such as star schema and snowflake schema, purposeful denormalization for query performance, column oriented storage characteristics, distribution and sort key selection, partitioning and clustering strategies, incremental loading patterns, handling slowly changing dimensions, time series data modeling, cost and performance trade offs in cloud managed warehouses, and platform specific features that affect query performance and storage layout. Candidates should be able to discuss end to end design considerations for large scale analytic workloads and trade offs between latency, cost, and maintainability.

40 questions

Complex Data Integration and Joins

Handling intricate join scenarios: multi-condition joins, conditional joins with complex logic, joining on date ranges or overlapping time periods, complex left joins with multiple filtering conditions, self-joins for hierarchical or relationship data, handling non-standard relationships between tables. Understanding implications of different join types on row counts, NULL values, and duplicate handling. Designing queries that correctly integrate data from multiple sources while maintaining data integrity and avoiding duplicate counting or missing data.

54 questions

Indexing Strategy and Selection

Covers index design principles and practical selection of indexes to accelerate queries while managing storage and write cost. Topics include index types such as B tree hash and bitmap indexes and full text and functional indexes; single column composite and covering indexes; clustered versus nonclustered index architectures and partial or filtered indexes. Candidates should reason about index selectivity and cardinality and how statistics and histograms influence optimizer choices. Also assess index maintenance overhead fragmentation and rebuild strategies and the trade off between faster reads and slower inserts updates and deletes. Practical skills include reading execution plans to identify missing or inefficient indexes proposing index consolidation or covering index designs testing and benchmarking index changes and understanding interactions between indexing partitioning and denormalization.

40 questions

Advanced SQL Window Functions

Mastery of Structured Query Language window functions and advanced aggregation techniques for analytical queries. Core function families include ranking functions such as ROW_NUMBER, RANK, DENSE_RANK, and NTILE; offset functions such as LAG and LEAD; value functions such as FIRST_VALUE, LAST_VALUE, and NTH_VALUE; and aggregate window expressions such as SUM OVER and AVG OVER. Candidates should understand the OVER clause with PARTITION BY and ORDER BY, frame specifications using ROWS BETWEEN and RANGE BETWEEN, tie handling, null behavior, and how frame definitions affect results. Common application patterns include top N per group, deduplication using row numbering, running totals and cumulative aggregates, moving averages, percent rank and distribution calculations, event sequencing and period over period comparisons, gap and island analysis, cohort and retention analysis, and trend and growth calculations. The topic also covers structuring complex queries with Common Table Expressions including recursive Common Table Expressions to break multi step analytical pipelines and to handle hierarchical or iterative problems, and choosing between window functions, GROUP BY, joins, and subqueries for correctness and readability. Performance and correctness considerations are essential, including join and sort costs, index usage, memory and sort spill behavior, execution planning and query optimization techniques, and trade offs across different database dialects and large data volumes. Interview assessments typically ask candidates to write and explain queries that use these functions, reason about frame semantics for edge cases such as ties, nulls, and partition boundaries, and to rewrite or optimize expensive queries.

40 questions

Database Fundamentals and Storage Engines

Core principles and components of data storage and persistence systems. This includes storage engine architectures and how they affect query processing and performance; transactions and isolation including atomicity, consistency, isolation, and durability; concurrency control and isolation levels; indexing strategies and how indexes affect read and write amplification; physical versus logical storage and object, block, and file storage characteristics; caching layers and cache invalidation patterns; replication basics and how replication affects durability and read performance; backup and recovery techniques including snapshots and point in time recovery; trade offs captured by consistency, availability, and partition tolerance reasoning; compression, cost versus performance trade offs, data retention, archival, and compliance concerns. Candidates should be able to reason about durability, persistence guarantees, operational recovery, and storage choices that affect latency, throughput, and cost.

52 questions

Data Warehouse and Dimensional Modeling

Design and model scalable analytical data systems using dimensional modeling principles and data warehouse architecture patterns. Core concepts include fact and dimension tables, defining and enforcing grain, surrogate keys, degenerate and role playing dimensions, conformed dimensions, and handling slowly changing dimensions including Type One, Type Two, and Type Three. Understand schema choices and trade offs such as star schema versus snowflake schema, normalization versus denormalization, and fact table types including transactional, periodic snapshot, and accumulating snapshot. Apply design decisions to meet query patterns and performance goals by considering partitioning, indexing, compression, columnar storage, and aggregation strategies. Be able to design schemas for different business domains, reason about data integration and consistency, and optimize for common analytical workloads and reporting requirements.

39 questions

CTEs & Subqueries

Common Table Expressions (CTEs) and subqueries in SQL, including syntax, recursive CTEs, usage patterns, performance implications, and techniques for writing clear, efficient queries. Covers when to use CTEs versus subqueries, refactoring patterns, and potential pitfalls.

40 questions

Aggregation and Grouping

Covers SQL grouping and aggregation concepts used to summarize data across rows. Key skills include using GROUP BY with aggregate functions such as COUNT, SUM, AVG, MIN, and MAX, counting distinct values, and filtering grouped results with HAVING while understanding the difference between WHERE and HAVING. Candidates should demonstrate correct handling of NULL values in aggregates, grouping by expressions and multiple columns, and writing multi level aggregations using ROLLUP, CUBE, and GROUPING SETS. Also important is knowing when to use subqueries or common table expressions for intermediate aggregation, the difference between aggregate functions and window functions, and how grouping interacts with joins and data types. Interview questions may test correctness of queries, edge cases, performance considerations such as appropriate indexes and query plans, and the ability to transform business questions like who are the top customers or which categories have declining sales into correct aggregated SQL statements.

40 questions

Query Optimization and Execution Plans

Focuses on diagnosing slow queries and reducing execution cost through analysis of query execution plans and systematic query rewrites. Candidates should be able to read and interpret explain output and execution plans including identifying expensive operators such as sequential table scans index scans sorts nested loop join hash join and merge join and explaining why those operators appear. Core skills include cost and cardinality estimation understanding join order and predicate placement predicate pushdown and selectivity reasoning comparing exists versus in versus join patterns and identifying common anti patterns such as N plus one queries. The topic covers profiling and benchmarking approaches using explain analyze and runtime statistics comparing estimated and actual row counts proposing and validating query rewrites and configuration or schema changes and reasoning about trade offs when using materialized views caching denormalization or partitioning to improve performance. Candidates should present step by step approaches to diagnose problems measure improvements and assess impact on other workloads.

53 questions

Page 1/4