InterviewStack.io LogoInterviewStack.io

Aggregation Functions and Group By Questions

Fundamentals of aggregation in Structured Query Language covering aggregate functions such as COUNT, SUM, AVG, MIN, and MAX and how to use them to calculate totals, averages, minima, maxima, and row counts. Includes mastery of the GROUP BY clause to group rows by one or more dimensions such as customer, product, region, or time period, and producing metrics like total revenue by month, average order value by product, or count of transactions by date. Covers the HAVING clause for filtering aggregated groups and explains how it differs from WHERE, which filters rows before aggregation. Also addresses related topics commonly tested in interviews and practical problems: grouping by multiple columns, grouping on expressions and date truncation, using DISTINCT inside aggregates, handling NULL values, ordering and limiting grouped results, using aggregates in subqueries or derived tables, and basic performance considerations when aggregating large datasets. Practice examples include calculating monthly revenue, finding customers with more than a threshold number of orders, and identifying top products by sales.

MediumTechnical
48 practiced
In BigQuery, explain why COUNT(DISTINCT user_id) can be expensive for high-cardinality user bases. Provide an example query using an approximate function (e.g., APPROX_COUNT_DISTINCT) and explain the accuracy/performance trade-offs.
EasyTechnical
52 practiced
Given the orders table:
orders(order_id, customer_id, amount numeric, created_at timestamp)
Write a PostgreSQL-compatible SQL query to calculate total revenue per customer for all orders in 2024. Return columns: customer_id, total_revenue, and order results by total_revenue DESC. Briefly explain how your query handles NULL amount values.
HardSystem Design
57 practiced
Design a daily ETL and reporting architecture that produces aggregated KPIs (daily active users, total revenue, avg session length) for Tableau and Power BI dashboards, given an event stream of 100M events/day. Cover ingestion, staging, incremental aggregation, failure handling, and how you expose data to BI tools for low-latency dashboards.
MediumTechnical
46 practiced
Write a SQL query using a derived table (CTE) to compute the top 10 customers by revenue and then join back to customers(customer_id, name, signup_date) to show customer details and their total_revenue. Explain why using a derived table can be advantageous for readability and performance.
HardTechnical
51 practiced
Your report uses a correlated subquery that computes customer_total in the SELECT for each row and runs slowly on 100M rows. Explain why correlated subqueries can be slow and show a refactor that computes customer totals once (e.g., using a CTE or JOIN to an aggregated table). Provide before-and-after query sketches.

Unlock Full Question Bank

Get access to hundreds of Aggregation Functions and Group By interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.