Aggregation Functions and Group By Questions

Fundamentals of aggregation in Structured Query Language covering aggregate functions such as COUNT, SUM, AVG, MIN, and MAX and how to use them to calculate totals, averages, minima, maxima, and row counts. Includes mastery of the GROUP BY clause to group rows by one or more dimensions such as customer, product, region, or time period, and producing metrics like total revenue by month, average order value by product, or count of transactions by date. Covers the HAVING clause for filtering aggregated groups and explains how it differs from WHERE, which filters rows before aggregation. Also addresses related topics commonly tested in interviews and practical problems: grouping by multiple columns, grouping on expressions and date truncation, using DISTINCT inside aggregates, handling NULL values, ordering and limiting grouped results, using aggregates in subqueries or derived tables, and basic performance considerations when aggregating large datasets. Practice examples include calculating monthly revenue, finding customers with more than a threshold number of orders, and identifying top products by sales.

MediumTechnical

0 practiced

In BigQuery, explain why COUNT(DISTINCT user_id) can be expensive for high-cardinality user bases. Provide an example query using an approximate function (e.g., APPROX_COUNT_DISTINCT) and explain the accuracy/performance trade-offs.

HardTechnical

0 practiced

Using GROUPING SETS, ROLLUP, or CUBE, write a SQL query that returns revenue broken down by (region, product), by region only, by product only, and the grand total in a single result set. Use a sales(region, product, amount) table. Explain how to identify which rows are totals versus detail rows in the result.

MediumTechnical

0 practiced

Write a query to find all customers whose total lifetime revenue is greater than the company-wide average customer revenue. Use an aggregated subquery or derived table to compute the company average, then filter customers who exceed it. Return customer_id, customer_revenue, company_avg.

HardTechnical

0 practiced

You must optimize a daily aggregation that computes daily revenue per product over a 500M-row sales table and is run hourly for dashboards. Outline a practical optimization plan (indexing, partitioning, clustering, materialized views, incremental aggregation strategies) for a cloud data warehouse. Be specific about steps and trade-offs.

HardTechnical

0 practiced

Describe an algorithm and SQL pattern to maintain an incremental aggregate table daily_revenue_by_product(product_id, date, revenue) using upserts/merges when new raw events arrive and late events can appear. Include watermarking, idempotency, and backfill considerations.

Unlock Full Question Bank

Get access to hundreds of Aggregation Functions and Group By interview questions and detailed answers.

Join thousands of developers preparing for their dream job.