SQL Scenarios Questions

Advanced SQL query design and optimization scenarios, including complex joins, subqueries, window functions, common table expressions (CTEs), set operations, indexing strategies, explain plans, and performance considerations across relational databases.

HardTechnical

0 practiced

You ran EXPLAIN (ANALYZE, BUFFERS) for a slow query and see this snippet:

Seq Scan on events (cost=0.00..80000.00 rows=100000 width=8) (actual time=0.123..1200.456 rows=98000 loops=1)Nested Loop (cost=100.00..50000.00 rows=1000 width=64) -> Index Scan using users_pkey on users ...

Describe how you'd analyze this plan, identify root causes for slowness, and propose concrete SQL/DB changes (indexes/statistics/query rewrite) to improve execution. Mention tools/queries to collect further evidence.

MediumTechnical

0 practiced

Explain how result/query caching in Snowflake or BigQuery works and how it can affect repeated feature queries during development. When will a query hit the cache, what are pitfalls (stale results, exact-text matching), and ways to leverage cache safely in training pipelines?

HardTechnical

0 practiced

After upgrading PostgreSQL, a previously fast query regresses. Outline a systematic SQL-focused debugging plan: collect EXPLAIN ANALYZE before/after, inspect pg_stats, check planner GUCs (join_collapse_limit, from_collapse_limit, enable_*), run ANALYZE on involved tables, and collect pg_stat_statements. Provide the specific SQL commands you'd run to gather evidence and a short remediation plan.

MediumTechnical

0 practiced

Given a PostgreSQL events table where duplicates exist (same user_id, event_type, occurred_at but different event_id): events(id bigint PRIMARY KEY, user_id bigint, event_type text, occurred_at timestamp). Write a SQL query that returns only the earliest event_id per (user_id, event_type, date_trunc('day', occurred_at)). Then explain how you would perform an in-place deduplication DELETE efficiently at scale.

MediumTechnical

0 practiced

Given raw_events(id bigint, user_id bigint, event_hash text, event_ts timestamp) with duplicates, return deduplicated rows keeping the earliest id per (user_id, event_hash) using window functions in PostgreSQL. Also describe how to efficiently delete duplicates in place in a large production table while minimizing locks and transaction size.

Unlock Full Question Bank

Get access to hundreds of SQL Scenarios interview questions and detailed answers.

Join thousands of developers preparing for their dream job.