InterviewStack.io LogoInterviewStack.io

Complex Joins and Set Operations Questions

Focuses on mastering joins and set operations for combining and transforming relational data across multiple tables. Candidates should understand all join types including inner, left, right, full outer, cross joins, self joins, and nested joins, and know when to use each for correctness and performance. This topic also covers set operations such as UNION, INTERSECT, and EXCEPT, differences between joins and set operations, handling duplicates and NULL values correctly, choosing between joins, subqueries, and common table expressions for clarity and efficiency, and reasoning about join order and its performance implications on large tables. Interview questions may include multi table join problems, complex business logic across four or more tables, and scenarios that reveal trade offs between approaches.

MediumTechnical
0 practiced
You have two monthly tables sales_jan and sales_feb with identical schemas. You want: (1) a single table of all sales and (2) a report flagging sale_ids that appear in both months. Show SQL using UNION ALL to combine then identify duplicates, and explain a scenario where JOINing the two month tables would be preferable.
MediumTechnical
0 practiced
You need a table showing monthly retention rate per signup cohort. Given signups(user_id, signup_date) and activity(activity_id, user_id, active_date), describe how to compute cohorts and retention using SQL window functions and joins in a way that avoids duplicating counts when users have many activities in a month.
MediumTechnical
0 practiced
Given these tables and approximate sizes: customers (5M rows, highly selective predicates on email), events (200M rows), orders (20M rows), lookups (small). A query joins customers -> events -> orders -> lookups. Explain how join order affects performance, how the optimizer might choose join algorithms, and what practical steps you would take to influence join order if the plan is suboptimal.
HardTechnical
0 practiced
Produce a reconciliation matrix that shows whether each order exists in four systems (a, b, c, d). Each source has schema (order_id, status). Design SQL to aggregate each source to one row per order_id, then join and produce columns: present_in_a boolean, status_a, present_in_b, status_b, ..., reconciliation_status (matched/mismatch/missing). Discuss scalability when each source has tens of millions of rows.
HardSystem Design
0 practiced
Design an incremental ETL that joins daily change files (CDC) for customers into a reporting fact table without full recompute. Discuss join strategies for MERGE/upsert, deduplication of changes, ensuring idempotency, ordering of deletes/inserts, handling late-arriving data, and whether to use CTEs vs staging tables.

Unlock Full Question Bank

Get access to hundreds of Complex Joins and Set Operations interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.