SQL for Data Analysis Questions
Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.
MediumTechnical
73 practiced
Feature engineering SQL task: For every user produce features used for modeling:- count_events_7d: number of events in past 7 days- days_since_last_event- avg_event_value_30d (events.value column)Table:Write a single SQL query (use CTEs/window functions) returning user_id and the three features computed as of '2024-10-01' (hard-coded date).
events(user_id INT, event_time TIMESTAMP, value NUMERIC)MediumTechnical
54 practiced
You are asked to compute the conversion rate for an A/B experiment and the 95% confidence interval using SQL. Tables:Write SQL that outputs variant, users, conversions, conversion_rate, and approximate 95% CI using normal approximation. State assumptions and where they may break down.
experiment_assignments(user_id INT, variant CHAR) -- 'A' or 'B'
conversions(user_id INT, converted BOOLEAN)HardTechnical
73 practiced
Compute Population Stability Index (PSI) between two samples (training and production) for a numeric feature using SQL. Tables:Return PSI value and a bucket-level breakdown. Explain thresholds for concern and limitations of PSI.
train_sample(user_id, feature_val)
prod_sample(user_id, feature_val)HardTechnical
56 practiced
Write a SQL query that finds connected user communities from an interactions table using recursive CTEs (graph connectivity). Table:Return community_id and user_id where community_id is the lowest user_id in the connected component. Assume an undirected graph.
interactions(user_a INT, user_b INT)MediumTechnical
63 practiced
Deduplication at scale: write SQL to find duplicate user records (same email) and keep the most recently updated profile. Table:Return the set of profile_ids to delete. Discuss transactional concerns for deleting in production.
profiles(profile_id INT, user_id INT, email TEXT, updated_at TIMESTAMP)Unlock Full Question Bank
Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.