Structured Query Language Fundamentals and Aggregation Questions

This topic covers core Structured Query Language fundamentals for analytical querying and reporting. Candidates should be able to write correct, readable, and maintainable SELECT queries with filtering using WHERE, sorting with ORDER BY, grouping with GROUP BY, and group filtering with HAVING. They should apply aggregate functions such as COUNT, COUNT DISTINCT, SUM, AVG, MIN, and MAX and understand how NULL values affect results, how empty result sets behave, and when to use different counting approaches. The scope includes date and time filtering, basic cohort segmentation, and common time based comparisons used to compute metrics such as daily active users, average revenue per user, and period over period comparisons. Candidates are expected to use basic joins and join predicates including inner joins and left joins, write simple subqueries and conditional expressions, and perform common data transformation and cleansing patterns to prepare data for analysis. Finally, this topic assesses query readability and maintainability practices such as aliasing and formatting, plus awareness of elementary performance considerations including index usage and avoiding unnecessary full table scans for entry to mid level analytical tasks.

HardTechnical

46 practiced

Explain the role of table and column statistics (ANALYZE, histograms, extended stats) in the query planner's ability to optimize aggregate queries. Describe how stale or missing stats can lead to bad plans and how you'd decide when to run ANALYZE or gather extended statistics in production.

MediumTechnical

49 practiced

Write a PostgreSQL SQL query to compute Monthly Recurring Revenue (MRR) for the last complete calendar month. Use the subscriptions table below which has UTC timestamps. Your query should handle subscriptions that started before the month and end after it. Schema:

subscriptions(sub_id BIGINT, user_id BIGINT, start_ts TIMESTAMP WITH TIME ZONE, end_ts TIMESTAMP WITH TIME ZONE NULL, monthly_amount NUMERIC)

State any assumptions and how you determine the month boundaries in UTC.

MediumTechnical

50 practiced

When should you prefer approximate distinct counting (e.g., HyperLogLog via approx_count_distinct) over exact COUNT(DISTINCT) in analytics SQL? Describe practical considerations and write example SQL for both exact and approximate distinct counts (choose BigQuery or Presto syntax). Explain trade-offs in accuracy and resource use.

EasyTechnical

51 practiced

Write a SQL query that returns user_id and last_purchase_date ordered with the most recent purchases first, but ensures users with NULL last_purchase_date appear at the end. Use PostgreSQL syntax and the table below:

users(user_id BIGINT, last_purchase_ts TIMESTAMP NULL)

Also explain how you would achieve the same NULL ordering in databases that don't support NULLS LAST syntax.

HardSystem Design

79 practiced

Design a scalable approach to compute daily unique users across multiple regions at petabyte data scale where exact distinct counting is infeasible. Describe components, the sketch algorithm (e.g., HLL), how to merge sketches from shards, error bounds, retention strategy, and how to handle late-arriving or duplicate events.

Unlock Full Question Bank

Get access to hundreds of Structured Query Language Fundamentals and Aggregation interview questions and detailed answers.

Join thousands of developers preparing for their dream job.