InterviewStack.io LogoInterviewStack.io

Aggregation Functions and Group By Questions

Fundamentals of aggregation in Structured Query Language covering aggregate functions such as COUNT, SUM, AVG, MIN, and MAX and how to use them to calculate totals, averages, minima, maxima, and row counts. Includes mastery of the GROUP BY clause to group rows by one or more dimensions such as customer, product, region, or time period, and producing metrics like total revenue by month, average order value by product, or count of transactions by date. Covers the HAVING clause for filtering aggregated groups and explains how it differs from WHERE, which filters rows before aggregation. Also addresses related topics commonly tested in interviews and practical problems: grouping by multiple columns, grouping on expressions and date truncation, using DISTINCT inside aggregates, handling NULL values, ordering and limiting grouped results, using aggregates in subqueries or derived tables, and basic performance considerations when aggregating large datasets. Practice examples include calculating monthly revenue, finding customers with more than a threshold number of orders, and identifying top products by sales.

HardTechnical
51 practiced
Given raw events that include duplicates, demonstrate how to deduplicate events before aggregation using ROW_NUMBER() (SQL) or using Spark (dropDuplicates with a stable ordering). Provide SQL that partitions by event_id and keeps the latest ingestion_time, then aggregates count per event_type.
EasyTechnical
86 practiced
Write an ANSI SQL query to produce total revenue per customer per month using the orders table (order_id, customer_id, total_amount, created_at). Use a portable method to group by month (e.g., date_trunc or EXTRACT depending on dialect). Show expected output for January and February 2025 for a few sample rows.
EasyTechnical
56 practiced
You have an orders table with schema:
order_id bigint,customer_id bigint,total_amount numeric,created_at timestamp
Sample rows:(1, 101, 50.00, '2025-01-05'),(2, 101, 75.00, '2025-01-20'),(3, 102, NULL, '2025-01-30')
Write an ANSI SQL query that returns, for each customer_id: total_revenue (SUM(total_amount)), order_count (COUNT(*)), and avg_order_value (AVG(total_amount)). Include SQL and a short explanation of how NULL total_amount values affect SUM and AVG.
EasyTechnical
49 practiced
Write a SQL query that counts the number of returned orders by month. The orders table schema: order_id, customer_id, order_date, is_returned boolean. Show how to implement this using SUM(CASE WHEN is_returned THEN 1 ELSE 0 END) and explain why SUM on a boolean may be preferred to COUNT.
MediumTechnical
87 practiced
Given a large events stream, design a streaming aggregation using Spark Structured Streaming to compute per-minute counts of events per event_type with late data handling (watermarks). Provide code sketch and explain exactly-once considerations.

Unlock Full Question Bank

Get access to hundreds of Aggregation Functions and Group By interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.