SQL for Data Analysis Questions
Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.
MediumTechnical
0 practiced
Given an analytics schema, propose column types and indexing strategy optimized for heavy aggregations and joins used by ML training. Schema:Explain choices for data types, indexing, compression, and when to use columnar vs row storage.
events(event_id INT, user_id BIGINT, event_type VARCHAR, value FLOAT, event_time TIMESTAMP)EasyTechnical
0 practiced
Analytic workflow question: when should an ML engineer prefer SQL for analysis versus using a Python/pandas notebook? Provide examples of tasks best suited to SQL and tasks better in Python for the ML lifecycle.
HardTechnical
0 practiced
Time-alignment problem: You have feature updates arriving at irregular times and labels with event timestamps. Write SQL to join each label to the most recent feature snapshot earlier than the label timestamp (last-known-value). Tables:Return label rows with the matched feature_val. Use standard/Postgres SQL and explain performance implications.
features(user_id INT, feature_time TIMESTAMP, feature_val NUMERIC)
labels(user_id INT, label_time TIMESTAMP, label INT)EasyTechnical
0 practiced
You have a table training_instances representing labeled examples used to train a model:Write a SQL query (Postgres-compatible) that returns each user_id and their average label value over the last 30 days (relative to now), only for users with at least 5 instances in that window. Explain any choices about timezone or null handling.
training_instances
- instance_id INT PK
- user_id INT
- label INT -- 0 or 1
- created_at TIMESTAMPMediumTechnical
0 practiced
Write a BigQuery-specific SQL query to compute per-user feature vectors by pivoting event counts for event_type values (assume up to 10 known types). Table:Return user_id and columns cnt_type_a, cnt_type_b, ... for the last 30 days. Use BigQuery functions or standard SQL constructs.
events(user_id INT, event_type STRING, event_time TIMESTAMP)Unlock Full Question Bank
Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.