SQL for Data Analysis Questions

Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.

MediumTechnical

0 practiced

Given an analytics schema, propose column types and indexing strategy optimized for heavy aggregations and joins used by ML training. Schema:

events(event_id INT, user_id BIGINT, event_type VARCHAR, value FLOAT, event_time TIMESTAMP)

Explain choices for data types, indexing, compression, and when to use columnar vs row storage.

EasyTechnical

0 practiced

Analytic workflow question: when should an ML engineer prefer SQL for analysis versus using a Python/pandas notebook? Provide examples of tasks best suited to SQL and tasks better in Python for the ML lifecycle.

HardTechnical

0 practiced

Time-alignment problem: You have feature updates arriving at irregular times and labels with event timestamps. Write SQL to join each label to the most recent feature snapshot earlier than the label timestamp (last-known-value). Tables:

features(user_id INT, feature_time TIMESTAMP, feature_val NUMERIC)
labels(user_id INT, label_time TIMESTAMP, label INT)

Return label rows with the matched feature_val. Use standard/Postgres SQL and explain performance implications.

EasyTechnical

0 practiced

You have a table training_instances representing labeled examples used to train a model:

training_instances
- instance_id INT PK
- user_id INT
- label INT -- 0 or 1
- created_at TIMESTAMP

Write a SQL query (Postgres-compatible) that returns each user_id and their average label value over the last 30 days (relative to now), only for users with at least 5 instances in that window. Explain any choices about timezone or null handling.

MediumTechnical

0 practiced

Write a BigQuery-specific SQL query to compute per-user feature vectors by pivoting event counts for event_type values (assume up to 10 known types). Table:

events(user_id INT, event_type STRING, event_time TIMESTAMP)

Return user_id and columns cnt_type_a, cnt_type_b, ... for the last 30 days. Use BigQuery functions or standard SQL constructs.

Unlock Full Question Bank

Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.

Join thousands of developers preparing for their dream job.