Spotify Data Scientist Interview Preparation Guide - Mid Level (2-5 Years)
Spotify's Data Scientist interview process spans 4-6 weeks and evaluates candidates through a structured progression of screening and technical interviews. The process begins with a recruiter phone screen to assess background alignment, followed by a technical phone interview to evaluate core programming and data science skills. The final stage consists of 4 comprehensive onsite interviews covering programming proficiency, system design capabilities, cultural fit, and domain-specific data science expertise. This comprehensive evaluation ensures candidates possess the technical depth, problem-solving ability, and collaborative mindset required to drive data-driven insights and contribute to Spotify's music and audio platform.
Interview Rounds
Recruiter Screening
What to Expect
This 30-minute phone call with a Spotify recruiter serves as the initial gate to assess your background and motivation for the Data Scientist role. The recruiter will review your resume, discuss your professional experience, and explain the role and interview process. This round is non-technical and focuses on cultural alignment and understanding your fit with Spotify's mission and values. The recruiter evaluates whether your background demonstrates the desired technical proficiency and whether your career interests align with the Data Scientist position.
Tips & Advice
Prepare a 2-minute elevator pitch about your professional background that highlights key accomplishments and why you're interested in Spotify specifically. Research Spotify's mission around connecting people through music and demonstrate genuine enthusiasm for this problem space. Tailor your narrative to emphasize relevant experience working with data at scale, machine learning projects, and cross-functional collaboration. When asked 'Why Spotify?', go beyond salary and benefits—discuss specific product features you use, music recommendation challenges that excite you, or Spotify's technical innovations. Be ready to discuss your expectations for the role and what you hope to learn. Prepare 2-3 thoughtful questions about the team, their data challenges, and how data science contributes to product decisions. For mid-level candidates, emphasize your ability to own projects independently and mentor junior team members.
Focus Topics
Career Growth and Learning Goals
Explain what technical skills you want to develop and how this role at Spotify supports your growth as a mid-level professional. Discuss areas where you seek to deepen expertise (e.g., working with massive user behavior datasets, building production ML systems, advanced statistical testing). Show that you're ambitious but realistic about mid-level responsibilities. Mention interest in mentoring junior team members and contributing to team technical decisions.
Practice Interview
Study Questions
Cross-Functional Collaboration Experience
Describe concrete examples of working effectively with product managers, engineers, designers, and other data scientists. Highlight how you translated business questions into data analysis, communicated findings to non-technical stakeholders, or influenced product decisions through insights. Show comfort with ambiguous requirements and ability to work with diverse teams. Demonstrate that you can bridge technical and business perspectives.
Practice Interview
Study Questions
Understanding of Spotify's Mission and Product
Demonstrate knowledge of Spotify as a platform and show thoughtful understanding of how data science powers key product features like personalized recommendations, playlist curation, and user engagement. Reference specific Spotify products or features you use. Discuss the business impact of data-driven decisions in the music streaming space. Show awareness of Spotify's competitive landscape and technical challenges in handling massive music catalogs and diverse user preferences.
Practice Interview
Study Questions
Motivation for Spotify Role
Articulate why you're specifically interested in this Data Scientist position at Spotify. Discuss what attracts you about the role, team, or company beyond compensation. Reference Spotify's product, business model, or data challenges. Show familiarity with Spotify's ecosystem (music recommendations, personalization, user engagement). Explain how this role aligns with your career goals and how working at Spotify will accelerate your growth as a mid-level data scientist.
Practice Interview
Study Questions
Professional Background and Experience Summary
Articulate your career journey, key roles, and technical growth from entry to mid-level. Emphasize how each position built relevant skills in data analysis, machine learning, and cross-functional work. Quantify your accomplishments with specific metrics (e.g., 'improved model accuracy by 15%', 'processed datasets with 10M+ records'). For mid-level, highlight 2-3 significant projects where you owned end-to-end components and drove measurable outcomes.
Practice Interview
Study Questions
Technical Phone Screen
What to Expect
This 1-hour video call evaluates your technical proficiency with computer science and data science fundamentals. You'll interact with 1-2 Spotify engineers or data scientists who will ask trivia-style questions on CS concepts, Python and SQL capabilities, statistical knowledge, and data analysis problem-solving. This round typically includes hands-on coding problems or SQL queries that assess your ability to write clean, efficient code and solve data manipulation tasks. The goal is to validate that you have the technical chops to succeed in onsite technical interviews.
Tips & Advice
Set up your interview environment with a reliable internet connection and test your video/audio beforehand. Have a code editor ready (most platforms provide a shared IDE like CoderPad or LeetCode). Think aloud while solving problems—explain your approach before coding. For mid-level candidates, interviewers expect clean, readable code that runs efficiently on the first or second attempt. Practice coding SQL queries and Python problems specifically focused on data manipulation, aggregation, filtering, and transformation. When tackling problems, clarify requirements first, outline your approach, mention edge cases, and explain the time/space complexity. For SQL, focus on joins, window functions, grouping, and optimization. Know the difference between concepts like integration vs unit testing, selection bias, data skewing, and be able to explain them concisely. Prepare to discuss how you'd approach a real data problem from your past experience. At mid-level, demonstrate not just correctness but also code quality and efficiency.
Focus Topics
Machine Learning Concepts and Algorithms
Understand supervised learning (regression, classification), unsupervised learning (clustering), and basic ensemble methods. Know concepts like overfitting, underfitting, regularization, cross-validation, and feature importance. Understand the bias-variance tradeoff. For mid-level, you should be familiar with common algorithms (linear regression, logistic regression, decision trees, random forests, K-means) and when to use each. Understand evaluation metrics for classification (accuracy, precision, recall, F1-score, AUC-ROC) and regression (RMSE, MAE, R²).
Practice Interview
Study Questions
Data Structures and Algorithms
Understand fundamental data structures (arrays, linked lists, trees, graphs, hash tables) and their operations. Know common algorithms for sorting, searching, and graph traversal. While not as deep as a software engineer interview, data scientists should understand algorithmic complexity (Big O notation), trade-offs between different approaches, and when to use which data structure. For mid-level, be prepared to explain how to detect anomalies, handle duplicates, or optimize memory usage in data processing tasks.
Practice Interview
Study Questions
Data Analysis Problem-Solving
Apply Python, SQL, and statistics together to solve real data analysis problems. This might include tasks like 'calculate user retention rates', 'identify trends in listening patterns', 'find overlapping subscription periods', or 'detect anomalies in engagement metrics'. Approach problems systematically: clarify requirements, break down the problem, write SQL/Python code, validate results, and explain findings. For mid-level, expect to handle moderately complex scenarios with multiple steps and some ambiguity.
Practice Interview
Study Questions
Statistics and Probability Fundamentals
Master concepts including distributions (normal, binomial, Poisson), hypothesis testing, p-values, confidence intervals, and statistical significance. Understand Type I and Type II errors, the distinction between correlation and causation, and why correlation doesn't imply causation in observational data. Be familiar with Bayesian thinking. Know how to interpret statistical results and communicate uncertainty. For mid-level, you should be able to discuss selection bias, sampling bias, and data skewing—how they occur and their impact on analysis.
Practice Interview
Study Questions
SQL Query Writing and Optimization
Write efficient SQL queries to retrieve, aggregate, and join data from databases. Master SELECT, WHERE, GROUP BY, HAVING, JOIN (INNER, LEFT, RIGHT, FULL OUTER), subqueries, and Common Table Expressions (CTEs). Understand query optimization—how to index, avoid N+1 query problems, and analyze query execution plans. Practice writing complex queries that involve multiple joins, aggregations, and window functions. Know how to check for overlapping date ranges, calculate running totals, and rank data within groups. For mid-level, you should write optimal queries that run efficiently on large datasets.
Practice Interview
Study Questions
Python Programming Fundamentals
Demonstrate proficiency in Python for data manipulation and analysis. Key areas include working with lists, dictionaries, and sets efficiently; understanding list comprehensions; writing clean, readable functions; exception handling; and working with common libraries like NumPy and Pandas. For mid-level, you should write optimized code that handles edge cases gracefully. Know the difference between mutable and immutable objects, understand generators for memory efficiency, and be comfortable with lambda functions and functional programming concepts.
Practice Interview
Study Questions
Onsite Interview - Programming Test
What to Expect
This component of the onsite interviews focuses on your ability to write clean, efficient code to solve data structure and algorithms problems. You'll work on a coding pad or IDE to solve 1-2 problems that may involve data manipulation, memory management, or data analysis. Interviewers evaluate your problem-solving approach, code quality, ability to handle edge cases, and communication throughout the process. For data scientists at mid-level, the focus is on practical coding ability that translates to real data engineering tasks rather than pure algorithmic complexity.
Tips & Advice
Treat this as a collaborative problem-solving session, not a solo coding challenge. Verbalize your thinking process as you code—explain your approach before diving into implementation. Write clean, readable code with meaningful variable names and comments. Test your code mentally against the examples provided and consider edge cases (empty inputs, large datasets, special characters). For a mid-level candidate, the interviewer expects you to solve problems correctly and efficiently on the first or second attempt. If you make a mistake, debug systematically. Ask clarifying questions about input constraints and expected output format. At the end, briefly discuss time and space complexity of your solution. Practice problems similar to Spotify's style: Prime number generation, detecting anomalies, music recommendation logic, subscription overlap detection, etc.
Focus Topics
Memory Management and Efficiency
Understand how to write code that uses memory efficiently, especially when processing large datasets. Avoid creating unnecessary copies of data. Use generators for streaming data. Understand when to trade memory for speed or vice versa. Be aware of how Python handles memory for different data types. For large-scale data processing, efficient memory usage is critical.
Practice Interview
Study Questions
Python Code Optimization and Quality
Write Python code that is not just correct but also efficient and maintainable. Use appropriate libraries (NumPy, Pandas for data tasks). Avoid unnecessary loops where vectorization is possible. Write clear function signatures with proper naming. Add comments for complex logic. Use Pythonic constructs like list comprehensions, f-strings, and context managers. Minimize redundancy and follow PEP 8 style guidelines. For mid-level, code quality is just as important as correctness.
Practice Interview
Study Questions
Data Analysis with Python Libraries
Use Python libraries effectively for data tasks: Pandas for data manipulation (filtering, grouping, joining), NumPy for numerical operations, and standard library functions for common tasks. Practice aggregating data, applying transformations, handling missing values, and combining datasets. Write code that reads naturally and is easy to understand, not just clever.
Practice Interview
Study Questions
Algorithm Problem-Solving and Implementation
Solve algorithmic problems that may include generating sequences (e.g., all prime numbers up to N), detecting patterns, or performing calculations. Demonstrate understanding of time and space complexity. Choose appropriate algorithms based on constraints. For mid-level, problems are typically medium difficulty—not trivial but also not requiring advanced techniques. Break problems into smaller steps and implement incrementally.
Practice Interview
Study Questions
Data Structure Implementation and Manipulation
Write code to work with fundamental data structures: arrays, dictionaries, sets, and basic trees. Implement operations like insertion, deletion, searching, and sorting. Handle nested data structures and know when to use which structure for optimal performance. For example, use sets for O(1) lookups, dictionaries to count occurrences, or lists when order matters. Understand memory implications of different choices.
Practice Interview
Study Questions
Onsite Interview - System Design
What to Expect
This interview evaluates your ability to design large-scale data systems and understand technical architecture. You'll discuss how to build a data solution, design database schemas, optimize queries for large datasets, and handle scalability challenges. For data scientists, this focuses on designing data pipelines, recommendation systems, or analytics platforms rather than general software architecture. The interviewer will present a problem (e.g., 'Design a music recommendation system for Spotify') and ask you to think through the system design, data flow, database design, and optimization strategies. This tests your ability to move beyond writing individual queries or models to thinking about end-to-end system architecture.
Tips & Advice
Approach system design problems systematically: (1) Clarify requirements and constraints (data volume, latency requirements, consistency needs), (2) Propose a high-level architecture, (3) Dive into database design and SQL optimization, (4) Discuss scalability and trade-offs, (5) Address potential challenges like handling overlapping data ranges or detecting anomalies. For music recommendation systems, discuss how you'd store user-song interactions, compute similarities, handle cold-start problems, and personalize recommendations. For Spotify, knowledge of music metadata, playlist structures, and user engagement is valuable. Draw diagrams to visualize data flow and database schemas. Discuss SQL optimization techniques: indexing, query execution plans, appropriate join types, partitioning large tables. Know when to use different storage solutions (relational databases, data warehouses). For a mid-level candidate, interviewers expect practical thinking, not necessarily deep expertise in distributed systems, but understanding of scalability considerations.
Focus Topics
Data Processing Trade-offs and Technology Choices
Understand trade-offs between different technologies and approaches: SQL vs NoSQL, batch processing vs streaming, real-time vs eventual consistency. For different problems, recommend appropriate tools and explain why. For example, when should you use Python scripts vs SQL vs Spark for data transformation? When is eventual consistency acceptable vs strict consistency required? For mid-level, show practical thinking rather than mastery of all technologies, but demonstrate awareness of trade-offs.
Practice Interview
Study Questions
Data Warehouse Architecture and Analytics
Understand data warehouse concepts including fact tables, dimension tables, and schemas (star schema, snowflake schema). Discuss how warehouses differ from operational databases. Consider partitioning strategies for fast query performance on large tables. Design fact and dimension tables to support analytics queries efficiently. For Spotify, think about how to structure data for analyzing listening patterns, recommendation performance, user engagement, and subscription dynamics.
Practice Interview
Study Questions
Scalability and Performance Optimization
Address how to scale systems to handle growing data volumes and user numbers. Discuss strategies like horizontal scaling, caching, CDNs for recommendations, and asynchronous processing. Know trade-offs between consistency, availability, and partition tolerance (CAP theorem concepts at a high level). For data science, focus on optimizing model serving, recommendation systems, and analytics queries. Understand when to use approximations or sampling for speed vs accuracy.
Practice Interview
Study Questions
Large-Scale Data Pipeline Design
Design end-to-end data pipelines that handle massive data ingestion, transformation, and delivery. Outline components: data sources, collection mechanisms, storage, processing steps, and output. Consider data flow for real-time vs batch processing. Discuss how to handle data quality, error handling, and monitoring. For music streaming data, design pipelines that ingest user listening events, process them at scale, and make results available for downstream analytics and personalization systems.
Practice Interview
Study Questions
SQL Database Design and Querying at Scale
Design database schemas that support business requirements while maintaining query efficiency. Normalize tables appropriately, choose suitable data types, and create effective indexes. Write optimized queries that join large tables efficiently. Understand query execution plans and how to identify bottlenecks. Know when to denormalize for query performance. For Spotify, design schemas that support tracking user subscriptions, listening history, recommendations, and engagement metrics at scale.
Practice Interview
Study Questions
Music Recommendation System Architecture
Design the architecture for a recommendation system like Spotify's Discover Weekly or Release Radar. Discuss how to model user preferences, compute recommendations (collaborative filtering, content-based, hybrid), handle cold-start problems for new users, and serve recommendations in real-time. Address how to track user interactions at massive scale, compute embeddings, and A/B test recommendations. Consider both the offline computation (generating candidate recommendations) and online serving (responding to requests quickly).
Practice Interview
Study Questions
Onsite Interview - Behavioral and Cultural Fit
What to Expect
This interview assesses how you work with others, handle challenges, and align with Spotify's culture and values. You'll discuss your past experiences, decision-making approach, collaboration style, and how you've handled ambiguity or failure. The interviewer (often a manager or senior peer) evaluates your ability to work effectively in a cross-functional environment, communicate insights to diverse stakeholders, and demonstrate ownership. For mid-level candidates, the focus is on your ability to own projects end-to-end, collaborate with teams, contribute to technical decisions, and show growth mindset. This round is not just about past accomplishments but demonstrating the behaviors Spotify values: collaboration, user-focus, data-driven decision-making, and continuous improvement.
Tips & Advice
Prepare 4-6 concrete project examples using the STAR method (Situation, Task, Action, Result). For each example, highlight your personal contributions, decision-making process, and outcomes. Include quantified impact where possible (e.g., 'improved retention by 5%', 'reduced latency from 10s to 2s'). At mid-level, emphasize projects where you owned significant components, mentored others, or influenced cross-functional decisions. Practice discussing how you handle ambiguity, learn from mistakes, and adapt. Be specific about challenges you've overcome and lessons learned. When discussing collaboration, emphasize listening to stakeholders, considering multiple perspectives, and building consensus. Show genuine interest in Spotify's culture by discussing alignment with values around music, user experience, and data-driven thinking. When asked about failure, discuss what you learned and how you applied that learning. Use concrete language, avoid corporate jargon, and tell authentic stories. Research Spotify's culture and values beforehand to reference them naturally.
Focus Topics
Learning from Failure and Iteration
Discuss a time when an analysis was incorrect, a model didn't work as expected, or a project took a wrong direction. Explain what went wrong, how you identified the issue, what you learned, and how you applied that learning going forward. Show humility, curiosity, and growth mindset. Avoid blaming others or external circumstances; focus on what you could control. This demonstrates resilience and continuous improvement orientation.
Practice Interview
Study Questions
Handling Ambiguity and Complex Problems
Describe situations where requirements were unclear, data quality was poor, or the problem hadn't been solved before. Explain how you approached ambiguity—asking clarifying questions, breaking problems into smaller pieces, defining success metrics, and iterating. Show comfort with complexity and ability to move forward despite incomplete information. For mid-level, this demonstrates readiness to own projects where you must define the problem, not just execute predefined analyses.
Practice Interview
Study Questions
Communication and Stakeholder Management
Share examples of presenting findings to non-technical stakeholders, translating complex analyses into actionable insights, and gaining buy-in for data-driven recommendations. Discuss how you tailor communication to different audiences—executives vs product teams vs engineers. Demonstrate ability to tell a compelling story with data, anticipate questions, and address concerns. Include examples where clear communication influenced decisions.
Practice Interview
Study Questions
Cross-Functional Team Collaboration
Discuss your experience collaborating with product managers, engineers, designers, and other data scientists. Provide specific examples of working through disagreements, integrating feedback, or aligning diverse perspectives. Show that you listen to stakeholders, understand their constraints and priorities, and translate business needs into technical requirements. Demonstrate your ability to communicate technical concepts to non-technical partners and translate product questions into data science approaches. Share experiences where collaboration led to better outcomes than individual work.
Practice Interview
Study Questions
Past Project Impact and Results
Describe 2-3 significant projects where you drove measurable business outcomes through data science. For each, explain the business context, your role and contributions, the technical approach, and quantified results. Emphasize decisions you made and trade-offs you considered. For mid-level, highlight projects where you owned end-to-end components, not just contributed analysis. Show how your work influenced product decisions or improved user experience. Examples might include building a churn prediction model that identified at-risk users, analyzing feature adoption to guide roadmap prioritization, or recommending algorithm improvements that increased engagement.
Practice Interview
Study Questions
Onsite Interview - Data Science Technical Interview
What to Expect
This final technical interview focuses on domain-specific data science knowledge and your ability to apply statistics, machine learning, and domain expertise to real problems. You'll face questions ranging from statistical concepts to machine learning model selection, feature engineering, metrics definition, and interpreting data. Problems might include defining metrics for business questions, explaining why a model underperforms, or designing experiments. The interviewer evaluates your statistical rigor, understanding of machine learning end-to-end, and ability to apply these concepts to Spotify's domain (music recommendation, user engagement, subscription dynamics). This round tests the application of knowledge rather than memorized definitions.
Tips & Advice
Review statistics, A/B testing, and machine learning concepts thoroughly. Be prepared to explain concepts at multiple levels—conceptually, mathematically, and practically. When answering conceptual questions, start with the fundamental idea, then provide examples and intuition. For machine learning questions, discuss trade-offs (bias-variance, accuracy-interpretability, computational cost). When working through data problems, think systematically: understand business context, define metrics, identify data requirements, explain approach, discuss validation. Practice explaining why data issues occur (selection bias, survivorship bias, measurement error) and their impact. Know common pitfalls in A/B testing and causal inference. Be ready to discuss how you'd validate a model in production and monitor for degradation. Reference Spotify's domain: how do you measure recommendation quality? How do you test personalization algorithms? Prepare to discuss metrics like retention, engagement, recommendation diversity, cold-start problem solutions.
Focus Topics
Advanced SQL and Data Querying
Write complex SQL queries for analytics and validation. Master window functions (ROW_NUMBER, RANK, LAG, LEAD) for time-series analysis and ranking. Use CTEs for readability and complex nested queries. Understand CASE statements for conditional logic. Know how to write efficient queries that leverage indexes. For data quality checks, write queries to identify duplicates, validate constraints, and profile data distributions. Understand SQL performance tuning and query optimization.
Practice Interview
Study Questions
Data Quality Assessment and Preprocessing
Identify and address data quality issues: missing values, outliers, duplicates, inconsistencies. Understand how data quality issues impact analysis and models. Know strategies for handling each type of issue (imputation, removal, transformation). Discuss data profiling and validation checks. For mid-level, demonstrate ability to assess data trustworthiness, catch issues early, and document assumptions. Understand how data quality issues can bias results (selection bias, measurement error, survivorship bias).
Practice Interview
Study Questions
Statistical Analysis and A/B Testing
Master statistical foundations essential for data-driven decisions. Understand hypothesis testing, p-values, confidence intervals, and statistical significance. Design and analyze A/B tests: determine sample size, detect when tests are valid, interpret results correctly, understand Type I and Type II errors. Know about multiple comparison problems and when to use corrections. For music products, discuss how to test recommendation changes, playlist algorithms, or feature rollouts. Be familiar with sequential testing and early stopping. Discuss when A/B testing is appropriate vs other experimental designs.
Practice Interview
Study Questions
Feature Engineering and Selection
Understand how to create, select, and validate features for machine learning models. Discuss domain knowledge application (e.g., for music, what user behavior signals predict engagement?). Know techniques for feature scaling, encoding categorical variables, and handling missing values. Discuss feature importance and interpretability. For large datasets, understand computational efficiency of feature calculation. Know when to create interaction features or polynomial features vs when to keep features simple. Demonstrate ability to translate business intuition into features.
Practice Interview
Study Questions
Machine Learning Model Development and Validation
Understand the full machine learning lifecycle: problem definition, data preparation, model training, validation, and deployment. Know techniques for preventing overfitting (cross-validation, regularization, early stopping). Understand train-test-validation split and why it matters. Discuss evaluation metrics for classification (precision, recall, F1, AUC-ROC, confusion matrix), regression (RMSE, MAE, R²), and ranking (NDCG, MRR). Know how to validate models on holdout test sets and why this is critical. For mid-level, demonstrate ability to implement end-to-end model development, not just training individual models.
Practice Interview
Study Questions
Metrics Definition and Business Interpretation
Define appropriate metrics for business questions and machine learning models. Understand leading vs lagging indicators, short-term vs long-term metrics. For Spotify, discuss metrics like user retention, listening frequency, playlist saves, recommendation diversity, and churn. Know how to translate business goals into metrics. Discuss metric sensitivity and specificity. Understand when to use aggregate metrics vs user-level metrics. At mid-level, you should define metrics that accurately capture business value, not just optimize for a single metric.
Practice Interview
Study Questions
Frequently Asked Data Scientist Interview Questions
Sample Answer
Sample Answer
SELECT
DATE(event_time AT TIME ZONE 'UTC' AT TIME ZONE '<Z>') AS day,
COUNT(DISTINCT user_id) FILTER (WHERE user_id IS NOT NULL) AS dau_users
FROM events
WHERE event_time >= '2024-01-01' AND event_time < '2024-02-01'
GROUP BY day;SELECT
DATE_TRUNC('month', event_time AT TIME ZONE 'UTC' AT TIME ZONE '<Z>') AS month,
COUNT(DISTINCT user_id) FILTER (WHERE user_id IS NOT NULL) AS mau_users
FROM events
GROUP BY month;COUNT(DISTINCT device_id) FILTER (WHERE user_id IS NULL) AS anon_devicesSample Answer
Sample Answer
Sample Answer
import scipy.stats as sps
def label_entropy(labels):
ps = [labels.count(l)/len(labels) for l in set(labels)]
return sps.entropy(ps, base=2)Sample Answer
WITH
-- 1) Map users to signup week
users_week AS (
SELECT
user_id,
DATE_TRUNC(signup_date, WEEK) AS signup_week
FROM users
WHERE signup_date IS NOT NULL
),
-- 2) event weeks for users, dedup per user-week to avoid double counting
user_event_weeks AS (
SELECT DISTINCT
e.user_id,
DATE_TRUNC(e.event_date, WEEK) AS event_week
FROM events e
JOIN users_week u ON u.user_id = e.user_id
WHERE e.event_date >= u.signup_week -- optional filter
),
-- 3) cohort-user -> week offset
cohort_activity AS (
SELECT
u.signup_week,
u.user_id,
DATE_DIFF(uew.event_week, u.signup_week, WEEK) AS week_offset
FROM users_week u
LEFT JOIN user_event_weeks uew
ON u.user_id = uew.user_id
-- keep offsets >= 0 and <= 12
WHERE uew.event_week IS NULL OR DATE_DIFF(uew.event_week, u.signup_week, WEEK) BETWEEN 0 AND 12
),
-- 4) cohort sizes
cohort_sizes AS (
SELECT
signup_week,
COUNT(DISTINCT user_id) AS cohort_size
FROM users_week
GROUP BY signup_week
),
-- 5) pivot counts for week 0..12
cohort_retention AS (
SELECT
c.signup_week,
cs.cohort_size,
COUNT(DISTINCT CASE WHEN ca.week_offset = 0 THEN ca.user_id END) AS week_0,
COUNT(DISTINCT CASE WHEN ca.week_offset = 1 THEN ca.user_id END) AS week_1,
COUNT(DISTINCT CASE WHEN ca.week_offset = 2 THEN ca.user_id END) AS week_2,
COUNT(DISTINCT CASE WHEN ca.week_offset = 3 THEN ca.user_id END) AS week_3,
COUNT(DISTINCT CASE WHEN ca.week_offset = 4 THEN ca.user_id END) AS week_4,
COUNT(DISTINCT CASE WHEN ca.week_offset = 5 THEN ca.user_id END) AS week_5,
COUNT(DISTINCT CASE WHEN ca.week_offset = 6 THEN ca.user_id END) AS week_6,
COUNT(DISTINCT CASE WHEN ca.week_offset = 7 THEN ca.user_id END) AS week_7,
COUNT(DISTINCT CASE WHEN ca.week_offset = 8 THEN ca.user_id END) AS week_8,
COUNT(DISTINCT CASE WHEN ca.week_offset = 9 THEN ca.user_id END) AS week_9,
COUNT(DISTINCT CASE WHEN ca.week_offset = 10 THEN ca.user_id END) AS week_10,
COUNT(DISTINCT CASE WHEN ca.week_offset = 11 THEN ca.user_id END) AS week_11,
COUNT(DISTINCT CASE WHEN ca.week_offset = 12 THEN ca.user_id END) AS week_12
FROM (SELECT DISTINCT signup_week FROM users_week) c
LEFT JOIN cohort_activity ca ON ca.signup_week = c.signup_week
LEFT JOIN cohort_sizes cs ON cs.signup_week = c.signup_week
GROUP BY c.signup_week, cs.cohort_size
)
SELECT
signup_week,
cohort_size,
week_0,
ROUND(100.0 * week_0 / cohort_size, 2) AS pct_week_0,
week_1,
ROUND(100.0 * week_1 / cohort_size, 2) AS pct_week_1,
-- ... repeat for week_2..week_12
week_12,
ROUND(100.0 * week_12 / cohort_size, 2) AS pct_week_12
FROM cohort_retention
ORDER BY signup_week;Sample Answer
Sample Answer
WITH cohorts AS (
SELECT user_id,
date_trunc('week', signup_date)::date AS cohort_week,
signup_date::date AS signup_date
FROM users
),
events_rel AS (
SELECT c.cohort_week,
c.signup_date,
e.user_id,
(e.event_date::date - c.signup_date::date) AS days_since_signup
FROM cohorts c
JOIN events e
ON e.user_id = c.user_id
AND e.event_date::date BETWEEN c.signup_date::date AND c.signup_date::date + 7
),
distinct_active AS (
-- unique users per cohort_week and day 0..7
SELECT cohort_week, days_since_signup, COUNT(DISTINCT user_id) AS active_users
FROM events_rel
GROUP BY cohort_week, days_since_signup
),
cohort_size AS (
SELECT cohort_week, COUNT(DISTINCT user_id) AS users_in_cohort
FROM cohorts
GROUP BY cohort_week
),
retention AS (
-- ensure 0..7 rows per cohort
SELECT cs.cohort_week,
d.day AS day,
COALESCE(da.active_users, 0) AS active_users,
cs.users_in_cohort,
ROUND(100.0 * COALESCE(da.active_users,0) / cs.users_in_cohort, 2) AS pct_retained
FROM cohort_size cs
CROSS JOIN (SELECT generate_series(0,7) AS day) d
LEFT JOIN distinct_active da
ON da.cohort_week = cs.cohort_week AND da.days_since_signup = d.day
)
SELECT cohort_week, day, users_in_cohort, active_users, pct_retained
FROM retention
ORDER BY cohort_week, day;Sample Answer
Sample Answer
Recommended Additional Resources
- Designing Data-Intensive Applications by Martin Kleppmann - for system design and distributed systems thinking
- Statistical Rethinking by Richard McElreath - for deep statistical understanding beyond frequentist testing
- LeetCode Medium-level problems (Data Structures, Algorithms) - for coding interview preparation
- Introduction to Statistical Learning (ISLR) by James, Witten, Hastie, Tibshirani - free online, covers ML fundamentals
- DataCamp courses on SQL, Python, and machine learning - practical, hands-on learning
- Spotify Engineering Blog (engineering.spotify.com) - insights into Spotify's technical challenges and solutions
- Causal Inference: The Mixtape by Scott Cunningham - understanding causality beyond correlation
- Mode Analytics SQL Tutorial - free interactive SQL learning tailored for analytics
- Kaggle competitions and datasets - real-world practice with music and recommendation system problems
- StatQuest with Josh Starmer YouTube channel - statistics and ML explained clearly
- Blind and Levels.fyi Spotify Data Science section - recent interview experiences and questions
Search Results
Spotify Data Scientist Interview in 2025 (Leaked Questions)
The Spotify Data Scientist interview includes a resume screen, recruiter phone screen, technical phone interview, and onsite interviews, ...
Exhaustive Spotify Data Scientist interview guide (2025) | Prepfully
The Spotify Data Scientist interview has three rounds: recruiter phone, technical phone, and onsite (programming, system design, cultural fit, data interview).
Top 12 Spotify Data Scientist Interview Questions + Guide in 2025
Spotify data scientist interviews cover databases, algorithms, machine learning, and analytics. Questions include database design, SQL queries, ...
Spotify Data Science Interview Process & Top Questions - YouTube
Ace your data science interviews with our complete prep course: https://bit.ly/4mkXQYV In this video, we break down everything you need to ...
9 Spotify SQL Interview Questions (Updated 2025) - DataLemur
Spotify asked these 9 SQL interview questions in recent Data Analyst, Data Science, and Data Engineering job interviews! Can you solve them?
Interview | Life at Spotify
Good? Get to know our hiring process before you apply or find answers to any lingering questions, right here, ...
This interview preparation guide was generated using AI-powered research from the sources listed above. While we strive for accuracy, we recommend verifying critical information from official company sources.
Want to create your own tailored preparation guide using our deep research?
Get Started for FreeInterview-Ready Courses
Visual-first, interactive, structured learning paths