Database Engineering & Data Systems Topics
Database design patterns, optimization, scaling strategies, storage technologies, data warehousing, and operational database management. Covers database selection criteria, query optimization, replication strategies, distributed databases, backup and recovery, and performance tuning at database layer. Distinct from Systems Architecture (which addresses service-level distribution) and Data Science (which addresses analytical approaches).
Relational Databases and SQL
Focuses on relational database fundamentals and practical SQL skills. Candidates should be able to write and reason about SELECT queries, JOINs, aggregations, grouping, filtering, common table expressions, and window functions. They should understand schema design trade offs including normalization and denormalization, indexing strategies and index types, query performance considerations and basic optimization techniques, how to read an execution plan, and transaction semantics including isolation levels and ACID guarantees. Interviewers may test writing efficient queries, designing normalized schemas for given requirements, suggesting appropriate indexes, and explaining how to diagnose and improve slow queries.
SQL Server
SQL Server relational database management system (RDBMS); covers installation and configuration, T-SQL programming, indexing and query optimization, data modeling, data types, transaction handling, backup and recovery, replication, high availability (Always On), security, maintenance, and administration tasks specific to SQL Server.
Database Scalability and High Availability
Architectural approaches and operational practices for scaling and maintaining database availability. Topics include vertical versus horizontal scaling trade offs; replication topologies, leader and follower roles, read replicas and replica lag; read write splitting and connection pooling; sharding and partitioning strategies including range based, hash based, and consistent hashing approaches; handling hot partitions and data skew; federation and multi database federation patterns; cache layers and cache invalidation; rebalancing and resharding strategies; distributed concurrency control and transactional guarantees across shards; multi region deployment strategies, cross region failover and disaster recovery; monitoring, capacity planning, automation for failover and backups, and cost optimization at scale. Candidates should be able to pick scaling approaches based on read and write patterns and explain operational complexity and trade offs introduced by distributed data.
Database Performance and Query Optimization
Evaluate ability to identify and remediate database performance bottlenecks including the N plus one query problem and expensive queries. Candidates should explain how to discover problematic queries through query plan inspection and profiling, and propose remedies such as appropriate indexing, query rewriting to use set based operations or joins, request batching and eager loading, pagination strategies, caching and denormalization when appropriate, and trade offs of read replicas or sharding. Interviewers expect discussion of measurement, monitoring, and the operational costs and consistency trade offs introduced by each optimization.
Large Scale Distributed Database Systems
Designing database systems handling petabyte-scale data, designing for global distribution across data centers, handling eventual consistency, managing data sovereignty and compliance requirements, and architectural decisions in complex multi-region setups.
Geospatial Data and Querying
Explore how to store index partition and query location based data at scale. Topics include spatial data types and spatial libraries, spatial indexing techniques and tree structures for range and nearest neighbor queries, geohash and tile based partitioning, coordinate reference systems and projection issues, and distance calculation methods. Candidates should describe query patterns for common geospatial use cases such as nearest neighbor search geofencing route matching and area aggregations, and explain trade offs between accuracy latency and storage cost as well as approaches to caching map tiles and handling moving entities.
Data Model Design and Access Patterns
Discuss how you'd design data models based on access patterns. Understand relational vs. NoSQL trade-offs. Know when to denormalize, how to handle distributed transactions, and strategies for scaling databases (sharding, partitioning). Discuss read vs. write optimization.
NoSQL Databases Basics
Basic understanding of NoSQL databases (MongoDB, DynamoDB, Cassandra) and when to use them. Understanding document model, trade-offs with relational databases, eventual consistency concepts.
Data Partitioning and Sharding
Techniques and operational practices for horizontally partitioning data across multiple database instances or storage nodes to achieve scale, improve performance, and manage growth. Includes selection and design of partition and shard keys to evenly distribute load and avoid hotspots, with range based, hash based, and directory based approaches and consistent hashing mechanisms. Covers handling uneven distribution and data skew, hotspot detection and mitigation, and the impact of partitioning on query patterns such as joins and cross shard queries. Explains implications for transactions and consistency, including transactional boundaries that span partitions and approaches to distributed transactions and compensation. Describes resharding and online data migration strategies, rolling rebalances, and methods to minimize downtime and data movement. Emphasizes operational concerns including shard management, automation, monitoring and alerting, failure recovery, and performance tuning. Discusses trade offs between simplicity, latency, throughput, and operational complexity and highlights considerations for both transactional and analytical workloads, including routing, caching, and coordination patterns.