SQL for Data Analysis Questions
Using SQL as a tool for data analysis and reporting. Focuses on writing queries to extract metrics, perform aggregations, join disparate data sources, use subqueries and window functions for trends and rankings, and prepare data for dashboards and reports. Includes best practices for reproducible analytical queries, handling time series and date arithmetic, basic query optimization considerations for analytic workloads, and when to use SQL versus built in reporting tools in analytics platforms.
MediumTechnical
60 practiced
Table `web_events(user_id INT, event_time TIMESTAMP, event_type TEXT)` contains event streams. Using SQL (Postgres or BigQuery), write a query that assigns a `session_id` to each event where a session is defined as consecutive activity separated by no more than 30 minutes of inactivity. Return user_id, event_time, event_type, session_id. Explain your approach and scalability considerations.
MediumTechnical
74 practiced
Table `transactions(transaction_id INT, user_id INT, amount NUMERIC, region TEXT)` contains payments. Write SQL (Postgres or BigQuery) to compute the 95th percentile of transaction amount per region and overall. Explain differences between `percentile_cont`, `percentile_disc`, and approximate functions such as `approx_percentile`, and when you'd use each.
HardSystem Design
56 practiced
Design an analytics data mart for product events to support daily dashboards and ad-hoc SQL queries. Requirements: ingest event stream (Kafka) with evolving schema, support daily aggregates by user/product, backfill capability, and 30-day SLA for historical recomputation on 10 TB raw data. Provide a high-level architecture, propose table schemas and partitioning strategy, and explain trade-offs between pre-aggregation and on-the-fly aggregation.
HardTechnical
69 practiced
Discuss trade-offs between materialized views, precomputed summary tables, and on-the-fly SQL aggregations for analytics workloads. Address freshness, storage cost, recomputation strategy (incremental vs full refresh), query latency, maintenance complexity, and when to pick each approach.
HardTechnical
57 practiced
Implement first-touch, last-touch, and a simple time-decay attribution model in SQL. Tables:Write SQL that attributes conversion revenue to channels for each conversion under the three attribution rules and discuss assumptions and limitations of pure-SQL attribution.
touchpoints(user_id INT, channel TEXT, touch_time TIMESTAMP)
conversions(conversion_id INT, user_id INT, conversion_time TIMESTAMP, revenue NUMERIC)Unlock Full Question Bank
Get access to hundreds of SQL for Data Analysis interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.