InterviewStack.io LogoInterviewStack.io

Aggregation and Grouping Questions

Covers SQL grouping and aggregation concepts used to summarize data across rows. Key skills include using GROUP BY with aggregate functions such as COUNT, SUM, AVG, MIN, and MAX, counting distinct values, and filtering grouped results with HAVING while understanding the difference between WHERE and HAVING. Candidates should demonstrate correct handling of NULL values in aggregates, grouping by expressions and multiple columns, and writing multi level aggregations using ROLLUP, CUBE, and GROUPING SETS. Also important is knowing when to use subqueries or common table expressions for intermediate aggregation, the difference between aggregate functions and window functions, and how grouping interacts with joins and data types. Interview questions may test correctness of queries, edge cases, performance considerations such as appropriate indexes and query plans, and the ability to transform business questions like who are the top customers or which categories have declining sales into correct aggregated SQL statements.

MediumTechnical
40 practiced
You must design an ETL strategy to compute aggregated daily metrics for a dashboard backed by a fact table with 500M rows per day. Discuss pre-aggregation strategies, whether to use streaming vs batch, partitioning schemes, and where to store pre-aggregates (materialized views, summary tables, OLAP cube). Outline trade-offs for latency vs storage.
HardTechnical
30 practiced
You must produce a pivot-like report using GROUPING SETS across six dimensions (region, category, channel, device, week, campaign) that includes aggregations for each single dimension and the overall grand total. Provide the SQL pattern using GROUPING SETS and discuss the combinatorial explosion risk and techniques to limit output and optimize performance.
HardSystem Design
52 practiced
Design a pre-aggregation and storage strategy to support a BI dashboard with sub-second response for queries like 'total revenue by product category by hour' on a dataset ingesting 100M events per day. Consider choice between OLAP engine (columnar), materialized hourly aggregates, incremental refresh, and cost trade-offs.
MediumTechnical
30 practiced
You have a query that repeats the same aggregation subquery three times across a report. Discuss when to use a CTE (WITH), a temporary table, or a materialized view for that intermediate aggregation. Consider freshness requirements, performance, and maintainability in your answer.
EasyTechnical
40 practiced
You have an orders table with columns: order_id int, product_id int, quantity int, price numeric, order_date date. In SQL (ANSI/PostgreSQL), write a query to compute total revenue per product for the last 90 days where revenue = sum(quantity * price). The query should: 1) treat NULL quantity or price as 0, 2) return columns product_id and total_revenue, 3) order results by total_revenue descending, and 4) limit to top 100 products. Show expected output schema and explain assumptions.

Unlock Full Question Bank

Get access to hundreds of Aggregation and Grouping interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.