InterviewStack.io LogoInterviewStack.io

SQL for Data Analysis Questions

Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.

MediumTechnical
73 practiced
Feature engineering SQL task: For every user produce features used for modeling:
- count_events_7d: number of events in past 7 days- days_since_last_event- avg_event_value_30d (events.value column)
Table:
events(user_id INT, event_time TIMESTAMP, value NUMERIC)
Write a single SQL query (use CTEs/window functions) returning user_id and the three features computed as of '2024-10-01' (hard-coded date).
MediumTechnical
54 practiced
You are asked to compute the conversion rate for an A/B experiment and the 95% confidence interval using SQL. Tables:
experiment_assignments(user_id INT, variant CHAR) -- 'A' or 'B'
conversions(user_id INT, converted BOOLEAN)
Write SQL that outputs variant, users, conversions, conversion_rate, and approximate 95% CI using normal approximation. State assumptions and where they may break down.
HardTechnical
73 practiced
Compute Population Stability Index (PSI) between two samples (training and production) for a numeric feature using SQL. Tables:
train_sample(user_id, feature_val)
prod_sample(user_id, feature_val)
Return PSI value and a bucket-level breakdown. Explain thresholds for concern and limitations of PSI.
HardTechnical
56 practiced
Write a SQL query that finds connected user communities from an interactions table using recursive CTEs (graph connectivity). Table:
interactions(user_a INT, user_b INT)
Return community_id and user_id where community_id is the lowest user_id in the connected component. Assume an undirected graph.
MediumTechnical
63 practiced
Deduplication at scale: write SQL to find duplicate user records (same email) and keep the most recently updated profile. Table:
profiles(profile_id INT, user_id INT, email TEXT, updated_at TIMESTAMP)
Return the set of profile_ids to delete. Discuss transactional concerns for deleting in production.

Unlock Full Question Bank

Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.