InterviewStack.io LogoInterviewStack.io

SQL for Data Analysis Questions

Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.

MediumTechnical
0 practiced
Write an SQL query that detects duplicate payments with the same `amount`, `customer_id`, and `payment_ts` within a 2-minute window. Given `payments(id, customer_id, amount, payment_ts)`, return `customer_id`, `payment_ts`, `duplicate_count` for groups with count > 1. Use ANSI SQL.
HardTechnical
0 practiced
You're given two large tables to join (`events` and `users`) and the join key has skew (a few user_ids have millions of events). Describe SQL and execution-level strategies to handle skewed joins in analytic queries (e.g., salting, broadcast join, filtering hot keys, pre-aggregation). Provide example SQL for salting approach.
MediumTechnical
0 practiced
Given `transactions(id, user_id, amount, created_at TIMESTAMP)`, write SQL that returns the median transaction amount per user for users with at least 10 transactions. Use a window function-based approach in PostgreSQL (or explain a DB-specific function if necessary).
EasyTechnical
0 practiced
Given two tables:
users(user_id INT PRIMARY KEY, created_at TIMESTAMP, email VARCHAR)
events(event_id INT, user_id INT, event_time TIMESTAMP, event_type VARCHAR)
Write an SQL query to compute the number of unique users who generated any event in the last 30 days (relative to '2024-05-31') and their signup month. Return `signup_month` (YYYY-MM) and `active_users` (count distinct). Use PostgreSQL-compatible SQL.
MediumTechnical
0 practiced
You must choose between computing metrics on-demand in SQL (ad-hoc) vs pre-aggregating and storing them (materialized views or summary tables). Describe the trade-offs and decision criteria (latency, freshness, storage cost, compute cost, complexity) and give examples when each approach is preferable.

Unlock Full Question Bank

Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.