Database Engineering & Data Systems Topics
Database design patterns, optimization, scaling strategies, storage technologies, data warehousing, and operational database management. Covers database selection criteria, query optimization, replication strategies, distributed databases, backup and recovery, and performance tuning at database layer. Distinct from Systems Architecture (which addresses service-level distribution) and Data Science (which addresses analytical approaches).
Complex Data Integration and Joins
Handling intricate join scenarios: multi-condition joins, conditional joins with complex logic, joining on date ranges or overlapping time periods, complex left joins with multiple filtering conditions, self-joins for hierarchical or relationship data, handling non-standard relationships between tables. Understanding implications of different join types on row counts, NULL values, and duplicate handling. Designing queries that correctly integrate data from multiple sources while maintaining data integrity and avoiding duplicate counting or missing data.
SQL Fundamentals and Query Writing
Comprehensive query writing skills from basic to intermediate level. Topics include SELECT and WHERE, joining tables with inner and outer joins, grouping with GROUP BY and filtering groups with HAVING, common aggregation functions such as COUNT SUM AVG MIN and MAX, ORDER BY and DISTINCT, subqueries and common table expressions, basic window functions such as ROW_NUMBER and RANK, union operations, and principles of readable and maintainable query composition. Also covers basic query execution awareness and common performance pitfalls and how to write correct, efficient queries for combining and summarizing relational data.
Database Design and Query Optimization
Principles of database schema design and performance optimization including relational and non relational trade offs, normalization and denormalization, indexing strategies and index types, clustered and non clustered indexes, query execution plans, common table expressions for readable complex queries, detecting missing or redundant indexes, sharding and partitioning strategies, and consistency and availability trade offs. Candidates should demonstrate knowledge of optimizing reads and writes, diagnosing slow queries, and selecting the appropriate database model for scale and consistency requirements.
SQL Performance and Anti Patterns
Recognition and remediation of common SQL performance anti patterns and pitfalls, such as accidental cartesian joins, N plus one query patterns, inefficient correlated subqueries, using functions in WHERE clauses that prevent index use, SELECT star usage, lack of appropriate indexes, large unbounded sorts or aggregations, and poor join ordering. Covers methods to diagnose problems using execution plans, explain analyze, and rewriting queries for better performance and scalability.
Advanced SQL Window Functions
Mastery of Structured Query Language window functions and advanced aggregation techniques for analytical queries. Core function families include ranking functions such as ROW_NUMBER, RANK, DENSE_RANK, and NTILE; offset functions such as LAG and LEAD; value functions such as FIRST_VALUE, LAST_VALUE, and NTH_VALUE; and aggregate window expressions such as SUM OVER and AVG OVER. Candidates should understand the OVER clause with PARTITION BY and ORDER BY, frame specifications using ROWS BETWEEN and RANGE BETWEEN, tie handling, null behavior, and how frame definitions affect results. Common application patterns include top N per group, deduplication using row numbering, running totals and cumulative aggregates, moving averages, percent rank and distribution calculations, event sequencing and period over period comparisons, gap and island analysis, cohort and retention analysis, and trend and growth calculations. The topic also covers structuring complex queries with Common Table Expressions including recursive Common Table Expressions to break multi step analytical pipelines and to handle hierarchical or iterative problems, and choosing between window functions, GROUP BY, joins, and subqueries for correctness and readability. Performance and correctness considerations are essential, including join and sort costs, index usage, memory and sort spill behavior, execution planning and query optimization techniques, and trade offs across different database dialects and large data volumes. Interview assessments typically ask candidates to write and explain queries that use these functions, reason about frame semantics for edge cases such as ties, nulls, and partition boundaries, and to rewrite or optimize expensive queries.
Complex Joins and Set Operations
Focuses on mastering joins and set operations for combining and transforming relational data across multiple tables. Candidates should understand all join types including inner, left, right, full outer, cross joins, self joins, and nested joins, and know when to use each for correctness and performance. This topic also covers set operations such as UNION, INTERSECT, and EXCEPT, differences between joins and set operations, handling duplicates and NULL values correctly, choosing between joins, subqueries, and common table expressions for clarity and efficiency, and reasoning about join order and its performance implications on large tables. Interview questions may include multi table join problems, complex business logic across four or more tables, and scenarios that reveal trade offs between approaches.
Data Warehouse and Dimensional Modeling
Design and model scalable analytical data systems using dimensional modeling principles and data warehouse architecture patterns. Core concepts include fact and dimension tables, defining and enforcing grain, surrogate keys, degenerate and role playing dimensions, conformed dimensions, and handling slowly changing dimensions including Type One, Type Two, and Type Three. Understand schema choices and trade offs such as star schema versus snowflake schema, normalization versus denormalization, and fact table types including transactional, periodic snapshot, and accumulating snapshot. Apply design decisions to meet query patterns and performance goals by considering partitioning, indexing, compression, columnar storage, and aggregation strategies. Be able to design schemas for different business domains, reason about data integration and consistency, and optimize for common analytical workloads and reporting requirements.
Relational Databases and SQL
Focuses on relational database fundamentals and practical SQL skills. Candidates should be able to write and reason about SELECT queries, JOINs, aggregations, grouping, filtering, common table expressions, and window functions. They should understand schema design trade offs including normalization and denormalization, indexing strategies and index types, query performance considerations and basic optimization techniques, how to read an execution plan, and transaction semantics including isolation levels and ACID guarantees. Interviewers may test writing efficient queries, designing normalized schemas for given requirements, suggesting appropriate indexes, and explaining how to diagnose and improve slow queries.
CTEs & Subqueries
Common Table Expressions (CTEs) and subqueries in SQL, including syntax, recursive CTEs, usage patterns, performance implications, and techniques for writing clear, efficient queries. Covers when to use CTEs versus subqueries, refactoring patterns, and potential pitfalls.