Google Data Scientist Interview Preparation Guide (Mid-Level)

Data Scientist

Google

Mid Level

6 rounds

Updated 6/19/2026

Google's Data Scientist interview process for mid-level candidates (2-5 years experience) consists of multiple rounds designed to assess technical proficiency, statistical thinking, machine learning expertise, product intuition, and cultural alignment. Interviews are conducted virtually through Google Meet with shared code editors, except for onsite rounds which may be in-person at a Google office. The complete process typically spans 4-6 weeks from initial recruiter contact through final feedback. Mid-level candidates are expected to demonstrate ownership of projects, ability to work independently with minimal supervision, understanding of trade-offs in technical decisions, some mentoring capability, and cross-functional collaboration skills.

Interview Rounds

Recruiter Screening

30 min3 focus topicsculture fit

What to Expect

Initial conversation with a Google recruiter lasting 20-30 minutes. This preliminary screen assesses your background, interest in the role, and basic fit before advancing to technical interviews. The recruiter will review your resume, discuss your career trajectory, validate your experience with technologies mentioned in the job description, answer your questions about the team and role, and confirm your availability for the interview process. This round determines whether you proceed to phone technical interviews.

Tips & Advice

Research the specific Google team and understand their products and focus area. Prepare a clear 2-minute professional summary highlighting your most relevant data science experience, emphasizing work with large datasets, Python and SQL, statistical analysis, and machine learning. Have 3-4 thoughtful questions about the team, role responsibilities, team composition, and growth opportunities. Show enthusiasm for Google's mission and the specific role. Be concise and clear about your experience and availability. Dress professionally. Have your calendar ready to discuss interview scheduling.

Focus Topics

Interest and Motivation for Google and Specific Team

Demonstrate knowledge of the specific Google team or product area you're joining. Understand their key challenges, how data science contributes, and why this role interests you. Show genuine enthusiasm rather than generic interest in Google.

Practice Interview

Study Questions

Career Background and Relevant Experience

Articulate your professional journey with emphasis on data science projects that align with the job description. Highlight experience with large-scale data analysis, model development, A/B testing, SQL and Python expertise, statistical analysis, and cross-functional collaboration. Mention specific projects, technologies used, and business outcomes.

Practice Interview

Study Questions

Technical Skills Verification

Be prepared to discuss hands-on experience with Python (pandas, NumPy, scikit-learn), SQL, statistical analysis, machine learning frameworks, and data visualization tools. Mention real projects where you applied these technologies and the outcomes achieved.

Practice Interview

Study Questions

Phone Technical Interview - SQL and Python

60 min4 focus topicstechnical

What to Expect

First technical interview lasting 45-60 minutes conducted virtually via Google Meet with a shared code editor. You will work through 1-2 data manipulation and analysis problems requiring SQL and Python coding. Problems typically involve extracting data from databases, transforming it, performing calculations, and deriving insights. The interviewer evaluates your ability to write clean, efficient code, solve problems methodically, communicate your approach, and handle edge cases. You may be asked to write SQL queries (joins, aggregations, window functions), manipulate data with pandas, or perform statistical calculations. This round tests both technical execution and problem-solving approach.

Tips & Advice

Communicate your approach before writing code. Ask clarifying questions about the problem, data schema, constraints, and expected output. For SQL problems, think about joins, aggregations, filtering, and optimization. For Python, use pandas and NumPy efficiently. Write clean, readable code with meaningful variable names and comments. Start with a correct solution, then optimize for performance if time permits. Explain your logic as you code. Test your solution mentally against sample data or ask the interviewer for test cases. Be prepared to refactor code based on feedback. Manage time carefully - aim to complete at least one full problem and discuss the second if time allows. For mid-level candidates, interviewers expect fluency with these languages and ability to write production-quality code.

Focus Topics

Problem-Solving Methodology and Communication

Break problems into manageable steps. Articulate your approach before writing code. Ask clarifying questions about requirements, data, and constraints. Explain your reasoning as you code. Test your solution against edge cases and example inputs. Discuss trade-offs between approaches. Show how you would validate results and handle potential errors.

Practice Interview

Study Questions

Data Analysis and Insight Extraction

Beyond writing queries and code, interpret results and extract meaningful insights. Calculate relevant metrics, identify patterns, anomalies, and trends. Explain what the data tells you about the underlying question. For mid-level, demonstrate how you would present findings to stakeholders and what business actions might follow from the analysis.

Practice Interview

Study Questions

Python Data Manipulation with Pandas and NumPy

Write Python code to efficiently manipulate, transform, and analyze data. Master pandas DataFrames: filtering rows and columns, groupby operations, merging datasets, handling missing values, creating new columns, reshaping data (pivot, melt), and aggregating functions. Master NumPy for numerical operations: array indexing, vectorized calculations, mathematical operations. Use list comprehensions and avoid unnecessary loops. Practice reading data files (CSV, JSON) and writing clean, readable code that others can understand.

Practice Interview

Study Questions

SQL Query Writing and Optimization

Master writing complex SQL queries to extract, filter, aggregate, and transform data. Master different join types (INNER, LEFT, RIGHT, FULL OUTER), GROUP BY with HAVING clauses, window functions (ROW_NUMBER, RANK, LAG, LEAD), CTEs (Common Table Expressions), and subqueries. Practice writing queries for common problems: finding top N items, calculating cumulative metrics, comparing groups over time, handling NULL values properly, and identifying duplicates. Understand query execution and optimization - avoid inefficient patterns, use indexes effectively, and write readable queries.

Practice Interview

Study Questions

Onsite Interview - Statistics and Experimentation

60 min5 focus topicstechnical

What to Expect

45-60 minute in-person or virtual interview focusing on your ability to design experiments, conduct statistical analysis, and make causal inferences. You will face questions about designing A/B tests, analyzing experimental results, and making business decisions based on data. Example questions: Design an experiment to test if a new YouTube recommendation algorithm improves watch time. How would you detect if a UI change affects user engagement? Interpret results from an experiment where treatment group shows 5% higher engagement than control. The interviewer assesses understanding of statistical principles, experimental design rigor, ability to identify confounding variables, and balance of statistical validity with practical constraints. This round is critical for product-focused data scientist roles.

Tips & Advice

Use a structured framework for all answers: Problem Definition (what are we trying to learn?), Hypothesis (what do we believe will happen?), Experiment Design (how do we test it?), Metrics (what do we measure?), Analysis (how do we interpret results?), and Trade-offs (what are the constraints?). For A/B tests, discuss control and treatment group creation, randomization strategy, sample size calculation (power analysis), duration of test, and metrics selected. Explain Type I and Type II errors and when each matters. Address confounding variables - discuss how to detect and control for them (stratification, blocking, regression adjustment). Show awareness of business context and practical constraints. For mid-level, demonstrate that you understand both the statistical theory and how to apply it to real product decisions. Mention specific Google products when appropriate. Discuss trade-offs: speed vs. statistical power, directional results vs. statistical significance, simplicity vs. sophistication.

Focus Topics

Metrics Design and Business Impact

Choose appropriate metrics that align with business goals. Discuss guardrail metrics that protect against unintended negative consequences. For YouTube, discuss metrics like watch time, session watch time, satisfaction; for Search, discuss click-through rate and dwell time. Understand the hierarchy: engagement metrics, retention, monetization. Discuss which metrics are leading indicators (predict future success) vs. lagging indicators. For mid-level, demonstrate that you think deeply about what success truly means.

Practice Interview

Study Questions

Interpreting Experimental Results and Decision Making

Interpret results from completed experiments. Distinguish between statistical significance and practical significance. Discuss how to present results to stakeholders including confidence intervals and magnitude of effects. Handle ambiguous results where effects are small or inconclusive. Discuss when to continue testing, iterate, or launch features based on data. Show awareness that business decisions involve factors beyond data.

Practice Interview

Study Questions

A/B Testing and Experimental Design

Design A/B tests to answer product questions for Google products. Define the change to test, identify control and treatment populations, discuss randomization method to avoid selection bias. Determine sample size required for statistical power. Choose test duration to account for time-of-week effects and reach sufficient sample size. Design clear metrics for success. Discuss potential pitfalls: selection bias, temporal confounds, multiple comparisons problem, peeking at results early. For mid-level, show comprehensive thinking about experimental design including implementation challenges.

Practice Interview

Study Questions

Hypothesis Testing and Statistical Significance

Understand the null and alternative hypotheses, significance levels (alpha, typically 0.05), p-values, and confidence intervals. Explain Type I error (false positive - rejecting true null hypothesis) and Type II error (false negative - failing to reject false null hypothesis). Discuss power of a test and how to calculate sample size needed for desired power. Explain one-tailed vs. two-tailed tests. Discuss practical significance vs. statistical significance - when a result is statistically significant but not practically meaningful.

Practice Interview

Study Questions

Confounding Variables and Causal Inference

Identify potential confounding variables in experiments (seasonality, day-of-week effects, user segment differences, external events). Discuss methods to control for confounds: blocking design, stratified randomization, regression adjustment, instrumental variables. Explain why correlation doesn't imply causation and how proper experimental design establishes causal relationships. Discuss external validity and generalizability - when results from one population may not apply to others. For mid-level, show systematic thinking about potential confounds and how to detect them.

Practice Interview

Study Questions

Onsite Interview - Machine Learning and Applied Modeling

60 min6 focus topicstechnical

What to Expect

45-60 minute interview evaluating your ability to build, evaluate, and optimize machine learning models for real business problems. You may be asked to design a model for a specific use case (predicting user engagement, ranking search results, recommending content, predicting ad click-through rates) or to explain machine learning concepts and discuss trade-offs. The interviewer assesses your understanding of feature engineering, model selection, performance evaluation, preventing overfitting, hyperparameter tuning, and model deployment considerations. For mid-level candidates, emphasis is on practical application and judgment rather than theoretical depth. Show that you balance model complexity with interpretability and understand real-world constraints.

Tips & Advice

Structure your answer using a problem-solving framework: Problem Understanding (what are we predicting?), Data Preparation (what features matter?), Model Development (what algorithm?), Evaluation (how do we validate?), and Iteration (how do we improve?). Start by proposing a simple baseline model before considering complex approaches. Discuss trade-offs explicitly: bias-variance trade-off, interpretability vs. performance, simplicity vs. accuracy, training time vs. accuracy. For mid-level, show judgment about when to favor a simple logistic regression for interpretability vs. when a complex model is justified. Discuss handling practical challenges: class imbalance, missing data, data quality issues. Explain your evaluation strategy: train/validation/test split, cross-validation, appropriate metrics for the problem. Discuss how you would deploy and monitor the model in production. For personalization problems, outline the full system: candidate generation, ranking, re-ranking for diversity and freshness.

Focus Topics

Building Personalization and Recommendation Systems

Design a recommendation system for a Google product (YouTube recommendations, search results ranking, ad selection, Google Maps). Describe the overall architecture: candidate generation phase (collaborative filtering, content-based filtering, or deep learning models), ranking phase to score candidate items. Discuss how to incorporate user signals (history, demographics, context), item metadata (content type, popularity, freshness), contextual information (time of day, device). Explain how to handle cold-start problem for new users or items. Discuss evaluation metrics (watch time, click-through rate, user satisfaction). Address fairness and diversity concerns: filter bubbles, bias toward popular content, user privacy. For mid-level, show system-level thinking about how data science fits into a complex pipeline.

Practice Interview

Study Questions

Hyperparameter Tuning and Model Optimization

Discuss strategies for finding good hyperparameters: grid search, random search, Bayesian optimization. Understand which hyperparameters matter most for different algorithms (learning rate for gradient boosting, regularization for linear models, tree depth for tree models). Discuss computational trade-offs of tuning - how much time is worthwhile? For mid-level, show awareness of tuning techniques and practical judgment about when additional tuning is worth the effort.

Practice Interview

Study Questions

Preventing Overfitting and Understanding Bias-Variance Trade-off

Understand the bias-variance trade-off: high bias (underfitting) means model is too simple and misses patterns; high variance (overfitting) means model fits noise in training data and doesn't generalize. Detect overfitting by comparing training vs. validation performance. Discuss regularization techniques: L1/L2 regularization, dropout for neural networks, early stopping. Explain why simpler models generally generalize better. For mid-level, demonstrate practical judgment about when to add complexity.

Practice Interview

Study Questions

Model Selection and Algorithm Trade-offs

Understand when to use different algorithms: logistic regression for simplicity and interpretability, tree-based models (random forests, gradient boosting) for capturing non-linear relationships and handling categorical features well, neural networks for complex patterns with large data, k-means or other clustering for unsupervised learning. Discuss fundamental trade-offs: linear vs. non-linear, parametric vs. non-parametric, bias-variance trade-off. Explain why you might choose a simpler model even if a complex model shows slightly better performance. For mid-level, demonstrate judgment about algorithm selection based on problem constraints.

Practice Interview

Study Questions

Feature Engineering and Data Preprocessing

Transform raw data into meaningful features that improve model performance. Handle missing values appropriately (imputation, removal, creating missing indicators). Scale numerical features when needed. Encode categorical variables (one-hot encoding, ordinal encoding, target encoding). Create interaction terms when domain knowledge suggests they matter. Apply dimensionality reduction when dealing with high-dimensional data. Discuss trade-offs: more features may improve performance but increase complexity and training time. For mid-level, explain why feature quality matters and give concrete examples from real data.

Practice Interview

Study Questions

Model Evaluation and Validation Strategy

Select appropriate evaluation metrics for your problem: for classification use accuracy, precision, recall, F1-score, ROC-AUC; for regression use MAE, RMSE, R-squared. Understand why different metrics matter for different problems. Implement proper validation strategy: train/validation/test split, k-fold cross-validation, time-series cross-validation for temporal data, stratified sampling for imbalanced data. Explain why you need separate train and test sets. Discuss how to detect overfitting vs. underfitting from learning curves.

Practice Interview

Study Questions

Onsite Interview - Product and Business Sense

60 min6 focus topicscase study

What to Expect

45-60 minute interview assessing your ability to connect data analysis to product strategy and drive business decisions. You face open-ended questions requiring structured thinking about metrics, product evaluation, and trade-offs. Example questions: How would you measure success of a new YouTube recommendation algorithm? What data would you track to improve user engagement in Google Maps? How would you evaluate the impact of a UI redesign? The interviewer evaluates whether you think strategically about product impact, understand user needs, define measurable success criteria, and can balance competing priorities. For mid-level, demonstrate end-to-end thinking from identifying metrics through driving business impact.

Tips & Advice

Use a structured framework for all answers: Understand Context (what is the business goal and user need?), Define Metrics (what indicates success at multiple levels?), Identify Trade-offs (what are the competing priorities?), Propose Approach (how would you measure impact?), Discuss Implementation (what are the practical considerations?). For metric design, think beyond vanity metrics to metrics that drive business value. Discuss guardrail metrics that prevent negative side effects. For mid-level candidates, show that you think about user segments differently - not all users may be affected equally. Ask clarifying questions about business context, user types, and constraints. Reference Google products strategically. Discuss trade-offs: short-term revenue vs. long-term engagement, personalization vs. privacy, speed vs. quality. Show collaboration mindset - discuss how you would work with product managers, engineers, and other teams. For mid-level, mention how you would influence decisions through clear data storytelling.

Focus Topics

User Segmentation and Heterogeneous Impact Analysis

Segment users to understand different behaviors, needs, and how product changes affect different groups. Identify high-value users or at-risk users. Analyze how a feature affects different segments - does it help or harm certain groups? For mid-level, show that you think about heterogeneous impacts rather than just average effects across all users.

Practice Interview

Study Questions

Trade-offs and Strategic Prioritization

Navigate competing priorities: short-term revenue vs. long-term user engagement, personalization that may reduce serendipity, privacy considerations vs. personalization, speed vs. quality, breadth vs. depth of recommendations. Discuss how to quantify trade-offs with data. For mid-level, demonstrate ability to present both sides of a trade-off and provide data-driven recommendations that balance competing interests.

Practice Interview

Study Questions

Data-Driven Product Strategy and Stakeholder Communication

Present data analysis and recommendations to non-technical stakeholders including product managers, engineers, and executives. Translate complex analytical findings into clear business implications. Show confidence intervals or uncertainty ranges when appropriate. Discuss limitations of analysis and confidence in recommendations. Use data visualization effectively. Tell a compelling narrative with data. For mid-level, demonstrate ability to influence product decisions through clear communication and data storytelling.

Practice Interview

Study Questions

Product Feature Evaluation and Impact Assessment

Design approaches to evaluate whether new product features drive intended outcomes. Use experimentation when possible (A/B tests). Consider observational analysis when experimentation isn't feasible. Discuss intended and unintended consequences of features. Consider indirect effects - a feature that increases engagement might decrease monetization or create other trade-offs. For mid-level, show awareness of complex interactions and ability to measure impact holistically.

Practice Interview

Study Questions

Metric Definition and Design for Decision-Making

Define metrics aligned with business goals and user needs. Distinguish between leading indicators (predict future success, enable faster learning) and lagging indicators (measure past performance). Discuss aggregation levels: user-level, session-level, daily, weekly. Explain why simple metrics like DAU might miss important insights about quality or satisfaction. Design metrics that are actionable (can product team influence it?), interpretable (clear what it measures), and robust (not easily gamed). For mid-level, show thoughtfulness about which metrics matter for strategic decisions.

Practice Interview

Study Questions

Google Product Metrics and Key Performance Indicators

Understand key metrics for Google's major products: YouTube (watch time, session watch time, click-through rate, user satisfaction surveys), Search (click-through rate, dwell time, zero-result rate, search quality ratings), Google Ads (click-through rate, conversion rate, return on ad spend), Google Maps (user engagement, navigation conversions, rating and review engagement). For each product, understand the hierarchy - how does a metric connect to business objectives? Discuss guardrail metrics that ensure improvements in one area don't harm others.

Practice Interview

Study Questions

Onsite Interview - Behavioral and Culture Fit

60 min6 focus topicsbehavioral

What to Expect

45-60 minute interview focused on your past experiences, work style, collaboration approach, and alignment with Google's culture and values. Interviewers assess how you've tackled challenges, collaborated across teams, influenced decisions through data, handled setbacks, and grown professionally. The interviewer wants to understand your thinking, motivations, and whether you'll contribute positively to team dynamics. Use the STAR method (Situation, Task, Action, Result) to structure responses. For mid-level candidates, emphasize owning projects end-to-end, mentoring junior colleagues, driving impact through cross-functional collaboration, continuous learning, and leadership through influence.

Tips & Advice

Prepare 5-7 strong project stories covering: a complex technical challenge you solved, impact you drove, challenge you overcame, disagreement you navigated, time you influenced others with data, mentoring experience, and learning from failure. Use STAR structure consistently. Quantify impact where possible (e.g., 15% improvement in model accuracy, feature launched to 50M users, mentored 2 junior data scientists). For mid-level, emphasize your leadership - how did you influence outcomes? Did you mentor others? Show growth trajectory. Be genuine and avoid over-rehearsed answers. Discuss how you stay current with data science trends - mention papers read, techniques learned, communities engaged. Ask thoughtful questions about team culture, growth opportunities, and how they measure success for data scientists. Discuss what excites you about Google's mission and the specific role. Show authentic curiosity.

Focus Topics

Mentoring, Teaching, and Developing Others

Describe how you've mentored, taught, or helped junior colleagues grow. Examples might include code reviews, guidance on technical approaches, explaining statistical concepts, or helping someone through a challenging project. Explain what they learned and how they improved. For mid-level, demonstrate investment in team capability development and leadership through teaching.

Practice Interview

Study Questions

Continuous Learning and Staying Current

Discuss how you stay updated with data science advancements: papers you've read, techniques you've learned, communities you engage with, courses you've taken. Mention specific recent learning and how you've applied it to work. For mid-level, balance depth (becoming expert in specific areas) with breadth (staying current across the field). Show genuine curiosity about advancing your skills.

Practice Interview

Study Questions

Overcoming Technical Challenges and Problem-Solving

Describe a technical obstacle you encountered (data quality issues, model performance bottleneck, scalability limitation, tool constraint). Explain how you diagnosed the root cause and implemented a solution. Discuss what you learned and how it made you more effective. For mid-level, show systematic problem-solving, resourcefulness, and resilience when facing complex issues.

Practice Interview

Study Questions

Cross-Functional Collaboration and Communication

Describe a project involving collaboration with product managers, engineers, or other teams. Explain how you aligned on goals, handled different perspectives, resolved disagreements, and worked toward shared objectives. Show that you can communicate with non-technical stakeholders and understand different team perspectives. For mid-level, demonstrate ability to work effectively across functions and contribute to team success.

Practice Interview

Study Questions

Influencing Business Decisions with Data

Describe a time your data analysis influenced an important business or product decision. Explain how you framed the business question, conducted the analysis, and communicated findings. Discuss what decision was made and the outcome. For mid-level, show that you can influence decisions across organizational boundaries and that you think about business context, not just technical analysis.

Practice Interview

Study Questions

Project Ownership and End-to-End Impact

Describe a substantial data science project you owned: problem identification, stakeholder alignment, data collection and exploration, analysis, model development, presenting findings, and driving business impact. Emphasize your leadership in the project. Explain how your work changed decisions or outcomes. Quantify impact with metrics when possible. For mid-level, demonstrate ownership of complex projects with significant business impact and minimal supervision.

Practice Interview

Study Questions

Frequently Asked Data Scientist Interview Questions

A and B Test DesignHardSystem Design

50 practiced

Design a scalable experimentation platform that supports feature flagging, deterministic randomization across services, event collection with exactly-once aggregation semantics, real-time monitoring dashboards, sequential testing, safe ramping, and automatic rollback. Target scale: 200M monthly users, 1000 concurrent experiments, 100k events/sec. Describe core components, data pipelines, storage, and how you prevent contamination and ensure assignment consistency.

Sample Answer

Requirements & constraints:- Functional: feature flags, deterministic assignment across services, event ingestion, sequential (adaptive) testing, safe ramping, automatic rollback, real-time dashboards.- Scale targets: 200M monthly users, 1000 concurrent experiments, 100k events/sec.- Non-functional: low-latency assignment, assignment consistency, contamination prevention, exactly-once aggregation, near real-time metrics (<30s).

High-level architecture:Client SDKs & Gateways → Deterministic Assignment Service → Feature Flag Config Store (CDN + authoritative control plane) → Event Collection (ingest) → Stream Processing (stateful real-time aggregation) → Experiment Evaluation Engine → Monitoring/Alerting & Dashboards → Data Warehouse for long-term analysis

Core components:1. Control Plane: UI + API to define experiments, variants, sequential rules, ramp policies, rollback thresholds. Stores configs in strongly-consistent DB (Postgres/Spanner).2. Config Distribution: CDN-backed configuration plus per-region cache (Redis). SDKs poll or use push (SSE) for near-real-time.3. Deterministic Assignment: Hash-based allocator using a stable experiment namespace and user id + salt. Example: bucket = HMAC_SHA256(salt || experiment_id || user_id) % 10000. SDKs compute locally to avoid network hop; server-side library uses same algorithm. Keep allocation metadata (seed, traffic split) in config store to ensure consistency across services and versions.4. Contamination prevention: Mutual exclusion via targeting rules; holdout groups; namespace isolation (one primary experiment per user-feature pair). Use assignment tiers (user-level vs session-level) and locking in control plane to reject overlapping conflicting experiments. Deterministic bucketing ensures consistent exposure across services and devices.5. Event Collection & Exactly-once Aggregation:- Ingest via idempotent HTTP with client-generated event_id and user_id to Kafka (partition by user_id).- Use Kafka with tombstone semantics and deduplication in stream layer: stream processor (Flink) maintains a stateful cache of recent event_ids (TTL window) and uses checkpointing for fault-tolerance. For durable exactly-once, use Kafka transactions + Flink’s two-phase commit to update aggregation sinks (OLAP store) atomically.6. Real-time processing: Flink jobs compute metrics (counts, sums, CTRs) per experiment/variant in rolling windows and persistent state (RocksDB). Emit to Materialized Views (Presto/Trino or Pinot/Druid) for dashboards.7. Dashboards & Alerting: Pre-aggregated low-latency store (Pinot/Druid) for sub-second queries; Grafana for visualization. Alert rules based on statistical thresholds and safety checks (minimum sample size, effect size, sequential p-value control like alpha spending or Bayesian posterior checks).8. Sequential testing & safe ramping: Control plane supports alpha spending (e.g., O’Brien-Fleming) or Bayesian sequential decision criteria. Ramping is automated via policy engine: when early metrics pass safety guards (no regression, min N, lower bound CI within tolerance), ramp to next percentage. Rollback triggers if loss exceeds threshold with sufficient power.9. Automatic Rollback: Orchestrator calls control-plane API to change flag to previous state; SDKs receive via push. Maintain audit trail and can run backfill to recompute impact.

Storage choices:- Config: strongly-consistent SQL (Spanner/Postgres)- Runtime caches: Redis (regional) + CDN- Event log: Kafka (multi-AZ)- Real-time state: Flink + RocksDB- Low-latency analytics: Pinot/Druid- Long-term: S3 + Parquet + Hive/BigQuery for offline analysis

Scalability & performance:- Partition Kafka by user_id to scale to 100k events/s.- Horizontally scale Flink cluster; use RocksDB for large state.- CDN + client-side deterministic assignment minimizes control-plane load.- Shard experiments by namespaces to limit per-job state.

Preventing contamination & ensuring assignment consistency:- Use deterministic bucketing with stable seeds stored in control plane and versioned configs.- Enforce namespace and targeting constraints at creation time.- Sticky assignment: bucket maps to unit (user_id) persisted client-side (optional) and re-evaluated identically across services.- Cross-device: use canonical user_id; fallback logic for anonymous sessions.- Audit logs and reproducibility: every assignment computed can be re-derived from stored seed/config and user_id.

Failure modes & trade-offs:- Exact-once requires careful event_id design and retention window; long dedupe window increases state size.- Client-side assignment reduces latency but needs secure config delivery to prevent tampering.- Using transactional stream processing increases complexity but provides correctness needed for experiments.

This design balances low-latency assignment, consistent deterministic bucketing, exactly-once aggregation via transactional stream processing, and automation for ramping/rollback suitable for 200M users and 100k events/sec.

Cross Functional Collaboration and CoordinationHardSystem Design

79 practiced

Design a scalable process for feature ownership and handoff across dozens of models to avoid duplication, ensure canonical sources, and manage feature lifecycle. Include ownership model, tooling, onboarding, and incentives for maintaining feature quality.

Sample Answer

Requirements:- Prevent duplicate feature implementations; single canonical source per feature- Support dozens of models, teams, and fast iteration- Track feature lifecycle (proposal → production → deprecation)- Low-latency/stream & batch access, observability, and governance

High-level architecture:Feature Registry & Governance Service ←→ Feature Store (online + offline) ←→ CI/CD + Catalog UI ←→ Model Infra / ConsumersMetadata DB stores ownership, lineage, schemas, tests, versions, and lifecycle state.

Key components and responsibilities:1. Feature Registry (catalog UI + API)- Stores canonical feature definitions, owners, description, SLA, provenance, access controls.- Enforces uniqueness via hash on (name, transformation, source).2. Feature Store- Offline store (parquet/warehouse) for training; online store (Redis/Cassandra) for serving.- Materialized views managed by orchestration (Dagster/Airflow).3. Governance & CI- Automated checks: schema, freshness, cardinality, skew, unit tests.- PR gating: new/changed features require metadata + tests + owner approval.4. Lineage & Observability- Instrumentation for feature usage counts, model-dependency graph, alerting on drift.

Ownership model:- Feature Owner: single person/team responsible for correctness, tests, SLAs.- Stewardship: domain stewards review proposals; cross-functional review board for conflicts.- Ownership recorded in Registry; permissions enforced for edits.

Onboarding & workflow:- Proposal: create feature spec in Registry (template: intent, source, derivation, tests).- Auto-lint + CI validation; steward review within SLA (e.g., 48h).- Once approved, feature is materialized, versioned, and discoverable.- Deprecation flow: mark deprecated, set sunset, notify dependent models, auto-block new uses after sunset.

Tooling:- Web Catalog UI with search, lineage graph, and usage metrics.- CLI/sdk for defining features as code (Python protos); templates and cookiecutter.- Orchestration (Dagster), testing framework, alerting (Prometheus/Grafana), access control (IAM).- Integrations with experiment tracking and model registry.

Incentives & SLA:- Ownership KPIs: number of dependent models, test coverage, freshness SLA, incident frequency.- Recognition & reward: visibility in dashboards, bonuses for high-quality features, “feature champions”.- Cost allocation: teams pay for materialization/serving resources to discourage redundant features.

Scalability & trade-offs:- Partition features by domain to scale metadata and stewardship.- Lazy materialization to reduce storage costs (compute-on-read for low-use features).- Trade-off: stricter governance slows velocity; mitigate with clear templates, fast review SLAs, and automation.

This process minimizes duplication, provides a single canonical source per feature, and creates clear accountability and incentives to maintain feature quality while scaling across many models.

Model Evaluation and ValidationEasyTechnical

93 practiced

You built a multiclass classifier (5 classes). Explain the difference between macro, micro, and weighted averaging when computing F1 scores. Provide an example scenario where macro F1 is preferable to weighted F1.

Data Storytelling and Insight CommunicationMediumTechnical

88 practiced

Describe three numerical techniques (for example, confidence intervals, bootstrapped estimates) and three visual techniques (for example, error bars, fan charts) you would use to communicate model uncertainty to product managers, and give a one-line example of how each technique aids decision-making.

Hypothesis Testing and InferenceMediumTechnical

31 practiced

In which business situations would you prefer Bayesian inference over classical frequentist hypothesis testing? Describe how you would choose priors, perform sensitivity analysis, and communicate posterior summaries and credible intervals versus confidence intervals to non-technical stakeholders.

Advanced Querying with Structured Query LanguageMediumTechnical

20 practiced

You have customers_master(customer_id) and customers_active(customer_id, last_active_date). Write SQL to find customers in master who have no active record in the last 12 months. Compare three approaches: LEFT JOIN ... WHERE active.customer_id IS NULL, NOT EXISTS, and EXCEPT (or MINUS). Discuss performance trade-offs and which you would prefer.

Sample Answer

Approach summary: filter customers_master for those that do NOT have an customers_active row within the last 12 months. Three common SQL patterns—LEFT JOIN anti-join, NOT EXISTS, and EXCEPT/MINUS—are shown, with trade-offs and my preference.

LEFT JOIN ... IS NULL

sql

SELECT m.customer_id
FROM customers_master m
LEFT JOIN (
  SELECT DISTINCT customer_id
  FROM customers_active
  WHERE last_active_date >= CURRENT_DATE - INTERVAL '12 months'
) a ON m.customer_id = a.customer_id
WHERE a.customer_id IS NULL;

NOT EXISTS

sql

SELECT m.customer_id
FROM customers_master m
WHERE NOT EXISTS (
  SELECT 1
  FROM customers_active a
  WHERE a.customer_id = m.customer_id
    AND a.last_active_date >= CURRENT_DATE - INTERVAL '12 months'
);

EXCEPT (or MINUS)

sql

-- PostgreSQL / standard
SELECT customer_id FROM customers_master
EXCEPT
SELECT customer_id
FROM customers_active
WHERE last_active_date >= CURRENT_DATE - INTERVAL '12 months';

Key points and trade-offs:- Correctness: All three work if you dedupe customers_active or use DISTINCT/EXISTS. EXCEPT removes duplicates implicitly.- NULL/duplicate behavior: LEFT JOIN can produce duplicates if customers_master has duplicates; DISTINCT helps. NOT EXISTS naturally handles duplicates in the right side. EXCEPT compares sets.- Performance: Most engines optimize NOT EXISTS into an anti-join; it's often the most readable and reliably performant. LEFT JOIN ... IS NULL can be optimized similarly, but historically performed worse when the planner chooses a hashed join and large intermediate result sets appear. EXCEPT may require sorting/hash to dedupe both sets — potentially expensive for large tables.- Indexing: Ensure customers_active.customer_id is indexed (and ideally last_active_date included or partitioned by date) so the WHERE last_active_date filter is selective and indexable.- Scale considerations: For very large active tables, use a prefiltered or materialized list of active customer_ids (daily refresh) or partition pruned queries to avoid full scans.

My preference: NOT EXISTS. It is clear, avoids unnecessary intermediate rows, and most modern optimizers turn it into an efficient anti-join. If profiling shows a different plan, choose the variant the DB’s optimizer handles best and ensure proper indexing or pre-aggregation.

A and B Test DesignMediumTechnical

50 practiced

You are running an A/B/n test with one control and five variants. Describe practical options to control familywise error rate or false discovery rate across variants. Compare Bonferroni, Holm-Bonferroni, Benjamini-Hochberg, and hierarchical (gatekeeping) approaches and recommend one for an exploratory growth experiment with many metrics.

Sample Answer

Start by distinguishing targets:- Familywise error rate (FWER) = probability of any false positive across tests.- False discovery rate (FDR) = expected proportion of false positives among rejected hypotheses.Which to control depends on tolerance for false alarms vs power.

Methods compared (5 variants + control => 5 tests):

1) Bonferroni- How: divide α by m (α/m) or multiply p-values by m.- Pros: simple, controls FWER under any dependence.- Cons: very conservative when m is moderate/large → low power; poor for exploratory work.

2) Holm–Bonferroni- How: step-down procedure that orders p-values and compares to α/(m−k+1).- Pros: Controls FWER, uniformly more powerful than Bonferroni.- Cons: Still conservative for many tests; complexity modest.

3) Benjamini–Hochberg (BH)- How: order p-values, find largest k with p_(k) ≤ (k/m)·q, reject up to k.- Pros: Controls FDR (under independence or positive dependence); much greater power than FWER methods; well-suited when some false positives are tolerable.- Cons: Accepts some expected false discoveries; assumptions about dependence matter (there are robust variants like BY).

4) Hierarchical / gatekeeping- How: pre-specify primary family (e.g., revenue), test family-wise at α; only if primary shows effect do you test secondary family.- Pros: Keeps α focused, protects key metrics, interpretable prioritization.- Cons: Requires pre-specification of priority, less flexible mid-experiment.

Recommendation for an exploratory growth experiment with many metrics:- Pre-specify a small number of primary metrics and evaluate them first (use unadjusted or FWER control if critical).- For the broader set of exploratory secondary metrics, use Benjamini–Hochberg to control FDR (e.g., q=0.05). BH preserves power and yields actionable leads while keeping expected false discovery proportion acceptable.- Complement with practical safeguards: pre-registration, report both raw and adjusted p-values, show effect sizes and CIs, and replicate promising signals in follow-up experiments.

Cross Functional Collaboration and CoordinationEasyBehavioral

45 practiced

Describe a time when you collaborated with a product manager to define success metrics for a machine learning feature. Explain the context, the specific model and business KPIs you proposed, how you translated technical metrics (e.g., AUC, precision) into business impact, and how you aligned on acceptance criteria and rollout gates.

Model Evaluation and ValidationEasyTechnical

69 practiced

Explain what stratified sampling achieves in cross-validation. Give an example using a 10-fold stratified CV for a binary classification task with 1% positives. Why is stratification important for rare classes?

Data Storytelling and Insight CommunicationEasyTechnical

98 practiced

You are shown a bar chart with a truncated y-axis that makes small differences look large. Describe three concrete changes you would make to the chart, explain how each change improves clarity for non-technical stakeholders, and provide a one-line example of an improved headline after your changes.

Practice Data Scientist questions across all topics

Additional Information

Want to create your own tailored preparation guide using our deep research?

Get Started for Free

Interview-Ready Courses

Visual-first, interactive, structured learning paths

Browse Data Scientist jobs

AI-enriched listings across hundreds of company career pages

Explore Jobs

Google Data Scientist Interview Preparation Guide (Mid-Level)

Interview Process Overview

Interview Rounds

Recruiter Screening

What to Expect

Tips & Advice

Focus Topics

Interest and Motivation for Google and Specific Team

Practice Interview

Study Questions

Career Background and Relevant Experience

Practice Interview

Study Questions

Technical Skills Verification

Practice Interview

Study Questions

Phone Technical Interview - SQL and Python

What to Expect

Tips & Advice

Focus Topics

Problem-Solving Methodology and Communication

Practice Interview

Study Questions

Data Analysis and Insight Extraction

Practice Interview

Study Questions

Python Data Manipulation with Pandas and NumPy

Practice Interview

Study Questions

SQL Query Writing and Optimization

Practice Interview

Study Questions

Onsite Interview - Statistics and Experimentation

What to Expect

Tips & Advice

Focus Topics

Metrics Design and Business Impact

Practice Interview

Study Questions

Interpreting Experimental Results and Decision Making

Practice Interview

Study Questions

A/B Testing and Experimental Design

Practice Interview

Study Questions

Hypothesis Testing and Statistical Significance

Practice Interview

Study Questions

Confounding Variables and Causal Inference

Practice Interview

Study Questions

Onsite Interview - Machine Learning and Applied Modeling

What to Expect

Tips & Advice

Focus Topics

Building Personalization and Recommendation Systems

Practice Interview

Study Questions

Hyperparameter Tuning and Model Optimization

Practice Interview

Study Questions

Preventing Overfitting and Understanding Bias-Variance Trade-off

Practice Interview

Study Questions

Model Selection and Algorithm Trade-offs

Practice Interview

Study Questions

Feature Engineering and Data Preprocessing

Practice Interview

Study Questions

Model Evaluation and Validation Strategy

Practice Interview

Study Questions

Onsite Interview - Product and Business Sense

What to Expect

Tips & Advice

Focus Topics

User Segmentation and Heterogeneous Impact Analysis

Practice Interview

Study Questions