InterviewStack.io LogoInterviewStack.io

SQL for Data Analysis Questions

Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.

MediumTechnical
74 practiced
Pivot / crosstab requirement: Produce monthly revenue per region for the last 6 months in a single row per region with columns: region, rev_YYYY_MM (one column per month). You can use conditional aggregation (CASE) or vendor-specific PIVOT. Given schema:
sql
orders(order_id, region varchar, order_date date, total_amount numeric)
Write the SQL (choose any dialect) and explain how you'd keep this crosstab efficient for dashboards.
MediumTechnical
68 practiced
Given a customers table with lifetime_value computed per customer, write a SQL query to identify the top 5% customers by lifetime_value. Return customer_id, lifetime_value, percentile_rank. Use window functions or analytic functions to compute percentile. Assume the table:
sql
customers(customer_id bigint, lifetime_value numeric)
Specify the SQL dialect you use.
MediumTechnical
63 practiced
NULLs and inconsistent values plague an ETL. Describe SQL strategies to detect and handle inconsistent status values, missing keys, and NULL timestamps before exposing data to dashboards. Provide example SQL snippets for validation checks and remediation (e.g., imputation or quarantining bad rows).
MediumTechnical
70 practiced
Cohort analysis: You have a users table and events table:
sql
users(user_id, signup_date date)
events(user_id, event_date date, event_type varchar)
Write a SQL query to compute 7-day retention for cohorts defined by signup week. Output should include cohort_week, cohort_size, day_0_retention, day_7_retention (percentage). Explain assumptions needed for this calculation.
MediumTechnical
56 practiced
Attribution problem: you have three tables:
sql
impressions(impression_id, user_id, ad_id, impression_time)
clicks(click_id, user_id, ad_id, click_time)
orders(order_id, user_id, order_time, amount)
Design a SQL query to attribute orders to the most recent click within 7 days prior to the order (last-touch). Return order_id, user_id, order_time, attributed_ad_id, days_since_click. Discuss how you would handle multiple clicks and missing clicks.

Unlock Full Question Bank

Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.