InterviewStack.io LogoInterviewStack.io

SQL for Data Analysis Questions

Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.

MediumTechnical
0 practiced
Compute a 7-day rolling average of daily revenue per user using SQL. Table:
transactions(user_id INT, txn_time TIMESTAMP, amount NUMERIC)
Return columns: user_id, date, daily_revenue, rolling_7d_revenue_avg. Handle days with zero revenue (i.e., user had no transactions that day) such that they count as zero in the average.
MediumTechnical
0 practiced
Union and append strategies: you receive daily parquet exports and must append them into a unified table for analysis. Describe SQL patterns to safely deduplicate and merge daily loads into a partitioned analytics table and provide a sample MERGE/UPSERT statement (generic SQL).
MediumTechnical
0 practiced
Create a SQL pattern to compute stratified sampling of users for offline model evaluation. Table:
users(user_id INT, country VARCHAR, signup_date DATE)
Return a sample of N users stratified by country proportionally. Provide SQL that works in Postgres-like systems and discuss randomness reproducibility.
MediumTechnical
0 practiced
Create a SQL query to flag experiments with suspicious metric patterns: low sample sizes, sudden drops in traffic, or missing conversion events. Describe metrics you would compute daily and the SQL logic to detect anomalies for automated alerts.
EasyTechnical
0 practiced
Write a SQL query that creates a labeled dataset split into train and test sets deterministically using a hash of user_id. Table:
users(user_id INT, signup_date DATE, country VARCHAR, label INT)
Return user_id and a column split with values 'train' or 'test' with 80/20 split using SQL functions only. Explain why deterministic split is important for ML pipelines.

Unlock Full Question Bank

Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.