InterviewStack.io LogoInterviewStack.io

SQL Scenarios Questions

Advanced SQL query design and optimization scenarios, including complex joins, subqueries, window functions, common table expressions (CTEs), set operations, indexing strategies, explain plans, and performance considerations across relational databases.

MediumTechnical
0 practiced
Given events(id, user_id, event_ts timestamp, payload text, source text) that contain duplicates and late arrivals, write a safe Postgres sequence of statements to deduplicate in place keeping the earliest id per (user_id, source, date_trunc('minute', event_ts)). The job must be resumable, run in batches, and avoid long locks that impact reporting.
HardSystem Design
0 practiced
Customer master is in Postgres and order history lives in Redshift. A report needs to join both systems. Explain three approaches: ETL to copy Postgres into Redshift, federated queries that join across systems at query time, and joining in the application layer. For each approach discuss performance, consistency, operational complexity, and recommended use cases for BI reporting.
MediumTechnical
0 practiced
You need sub-minute dashboards but base data is updated frequently. Propose a materialized view or summary table strategy to serve aggregated KPIs (daily active users, revenue by region) in PostgreSQL or Redshift. Discuss refresh approaches: full refresh, partitioned refresh, incremental refresh/merge, concurrency control, and the freshness vs cost trade-offs.
EasyTechnical
0 practiced
Given a transactions table:
transactions(transaction_id int, user_id int, transaction_ts timestamp, amount numeric, source_id int)
Write a SQL query using window functions (Postgres) to identify duplicate business-key occurrences where the business key is (user_id, source_id, date_trunc('minute', transaction_ts)). For each duplicate group keep the earliest transaction_id and output the list of transaction_ids to delete. Explain how to delete duplicates safely in batches.
MediumTechnical
0 practiced
You ran EXPLAIN ANALYZE on a slow query and saw a plan with a Seq Scan that reports actual rows much larger than the estimated rows and a Nested Loop join with very high actual time. Given the following simplified snippet from Postgres:
Seq Scan on orders (cost=0.00..10000.00 rows=100000 actual time=0.5..500.0 rows=100000)Nested Loop (cost=10000.00..50000.00 rows=100000 actual time=500.1..1500.0 rows=100000) -> Hash Join ...
Identify the likely bottlenecks, explain what the planner estimates mean versus actual, and describe 3 concrete actions (indexes, query rewrite, server settings) you would take to improve runtime.

Unlock Full Question Bank

Get access to hundreds of SQL Scenarios interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.