Aggregation Functions and Group By Questions

Fundamentals of aggregation in Structured Query Language covering aggregate functions such as COUNT, SUM, AVG, MIN, and MAX and how to use them to calculate totals, averages, minima, maxima, and row counts. Includes mastery of the GROUP BY clause to group rows by one or more dimensions such as customer, product, region, or time period, and producing metrics like total revenue by month, average order value by product, or count of transactions by date. Covers the HAVING clause for filtering aggregated groups and explains how it differs from WHERE, which filters rows before aggregation. Also addresses related topics commonly tested in interviews and practical problems: grouping by multiple columns, grouping on expressions and date truncation, using DISTINCT inside aggregates, handling NULL values, ordering and limiting grouped results, using aggregates in subqueries or derived tables, and basic performance considerations when aggregating large datasets. Practice examples include calculating monthly revenue, finding customers with more than a threshold number of orders, and identifying top products by sales.

MediumTechnical

0 practiced

Explain GROUPING SETS, ROLLUP, and CUBE in SQL and give an example that computes revenue by region, by product, and by both region and product in a single query. Describe when using these constructs is preferable to running multiple separate GROUP BY queries in BI workflows.

HardTechnical

0 practiced

A GROUP BY query on a large fact table performs a full table scan and slow sort. Describe step-by-step how to use EXPLAIN or EXPLAIN ANALYZE to find bottlenecks, and list concrete optimizations: rewriting the query, adding indexes, clustering, enabling parallelism, or pre-aggregating. Provide specific SQL examples for Postgres to illustrate improving GROUP BY performance.

MediumTechnical

0 practiced

You must join customer lifetime metrics onto a customers table for dashboards. Given orders(order_id, customer_id, amount) and customers(customer_id, name), write SQL using a CTE or derived table that computes lifetime_spend and order_count per customer and joins to customers. Discuss pros and cons of CTE vs subquery vs materialized view for dashboards that see frequent reads.

HardSystem Design

0 practiced

Design an architecture to support sub-minute dashboard refreshes for aggregated metrics (hourly revenue, daily active users, top products) for a high-traffic e-commerce site. Discuss use of materialized views, incremental materialization via CDC, streaming aggregations, cache layers, consistency trade-offs, and when to choose each component based on cost and SLA.

HardSystem Design

0 practiced

Design a star schema for a retail analytics warehouse focused on efficient aggregations for BI. Define the fact and dimension tables, the fact grain, surrogate key strategy, handling slowly changing dimensions (SCD Type 2), and explain how this design speeds up GROUP BY queries compared to a highly normalized OLTP schema.

Unlock Full Question Bank

Get access to hundreds of Aggregation Functions and Group By interview questions and detailed answers.

Join thousands of developers preparing for their dream job.