InterviewStack.io LogoInterviewStack.io

Aggregation Functions and Group By Questions

Fundamentals of aggregation in Structured Query Language covering aggregate functions such as COUNT, SUM, AVG, MIN, and MAX and how to use them to calculate totals, averages, minima, maxima, and row counts. Includes mastery of the GROUP BY clause to group rows by one or more dimensions such as customer, product, region, or time period, and producing metrics like total revenue by month, average order value by product, or count of transactions by date. Covers the HAVING clause for filtering aggregated groups and explains how it differs from WHERE, which filters rows before aggregation. Also addresses related topics commonly tested in interviews and practical problems: grouping by multiple columns, grouping on expressions and date truncation, using DISTINCT inside aggregates, handling NULL values, ordering and limiting grouped results, using aggregates in subqueries or derived tables, and basic performance considerations when aggregating large datasets. Practice examples include calculating monthly revenue, finding customers with more than a threshold number of orders, and identifying top products by sales.

MediumTechnical
0 practiced
You need to compute total revenue per customer per month on a multi-terabyte orders table in a cloud data warehouse. Describe practical optimizations (partitioning, clustering, materialized views, incremental pre-aggregation) and write an example SQL approach (e.g., use of partitioned table and pre-aggregated daily summary).
HardTechnical
0 practiced
Write SQL or pseudocode to compute distinct users per (country, device) pair using scalable, memory-efficient techniques. Discuss trade-offs of concatenating keys and using exact COUNT(DISTINCT) vs HyperLogLog (HLL) sketches and provide an example using BigQuery APPROX_COUNT_DISTINCT or an HLL UDF.
MediumTechnical
0 practiced
Given a large events stream, design a streaming aggregation using Spark Structured Streaming to compute per-minute counts of events per event_type with late data handling (watermarks). Provide code sketch and explain exactly-once considerations.
HardTechnical
0 practiced
Given raw events that include duplicates, demonstrate how to deduplicate events before aggregation using ROW_NUMBER() (SQL) or using Spark (dropDuplicates with a stable ordering). Provide SQL that partitions by event_id and keeps the latest ingestion_time, then aggregates count per event_type.
MediumTechnical
0 practiced
Design a daily materialized view (or summary table) in Snowflake that stores pre-aggregated revenue by product and date to speed BI queries. Include the DDL for the summary table, refresh strategy, and how to handle backfills and reprocessing when upstream data is corrected.

Unlock Full Question Bank

Get access to hundreds of Aggregation Functions and Group By interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.