InterviewStack.io LogoInterviewStack.io

Set Operations and Complex Aggregations Questions

Understanding UNION, UNION ALL, EXCEPT, INTERSECT operations and their performance implications. Complex GROUP BY queries, HAVING clauses, and multi-level aggregations.

HardTechnical
0 practiced
Technical-domain-specific: Compare how PostgreSQL, Snowflake, BigQuery, Spark and Redshift implement set operations (UNION/INTERSECT/EXCEPT) and distinct aggregation internally. For each engine, discuss typical execution strategies (sort vs hash), memory behavior, availability of EXCEPT ALL or INTERSECT ALL, and tuning knobs you would use to improve performance of large set operations.
EasyTechnical
0 practiced
Scenario: A product manager asks why replacing UNION ALL with UNION could reduce correctness issues but increase query latency, and insists on using UNION for safety. How would you explain the trade-off in non-technical terms and propose a safer, performant alternative implementation strategy?
HardSystem Design
0 practiced
System design (harder): Architect a multi-tenant data warehouse that must provide near-real-time unique-user counts per tenant and support ad-hoc multi-dimensional queries. Requirements: dedupe events ingested from multiple streaming sources, isolation per tenant, ability to backfill, and cost control. Sketch components, storage format (Delta, Iceberg, Parquet), ingestion strategy (stream vs micro-batch), and how set operations and aggregations fit into the design.
MediumTechnical
0 practiced
Technical task (SQL): Two sources have slightly different schemas: logs_v1(event_time TIMESTAMP, event_type TEXT) and logs_v2(ts TIMESTAMP, type TEXT, user_id INT). Write a single SQL statement that normalizes and unions these sources into a canonical view (event_time, event_type, user_id) using union all, handling missing columns and type casting. Assume ANSI SQL and explain choices.
MediumTechnical
0 practiced
Scenario troubleshooting: A nightly pipeline that aggregates event counts per user T fails with OOM at the GROUP BY step on the compute cluster. The input volume has not changed. Describe a systematic approach to diagnose the root cause and list mitigation steps including quick fixes (increase resources), medium fixes (re-partitioning, two-stage aggregation), and long-term fixes (pre-aggregation, reorganizing data layout).

Unlock Full Question Bank

Get access to hundreds of Set Operations and Complex Aggregations interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.