InterviewStack.io LogoInterviewStack.io

Lyft-Specific Data Modeling & Analytics Requirements Questions

Lyft-specific data modeling and analytics requirements for data platforms, including ride event data, trip-level schemas, driver and rider dimensions, pricing and surge data, geospatial/location data, and analytics needs such as reporting, dashboards, and real-time analytics. Covers analytic schema design (star/snowflake), ETL/ELT patterns, data quality and governance at scale, data lineage, privacy considerations, and integration with the broader data stack (data lake/warehouse, streaming pipelines).

MediumTechnical
0 practiced
Explain SCD Type 1, Type 2, and Type 3 with practical examples, and recommend which SCD type to apply for driver dimension attributes such as current_rating (fast-changing), hometown (rarely changes), and preferred_vehicle (can change). Discuss implications for storage growth and query patterns.
HardTechnical
0 practiced
You are leading a migration of hundreds of operational dashboards from Tableau to Looker. Describe a migration plan covering inventory and prioritization, mapping visualizations and metrics, preserving schedules and extracts, validating metric parity, user communication, rollback strategy, automated/manual validation steps, and staffing considerations for the migration effort.
MediumTechnical
0 practiced
Write a SQL query (standard SQL/BigQuery) that flags daily city-level price outliers where a trip's price is greater than the city's daily mean plus 3 standard deviations. Use trips(trip_id, city, price, started_at). Include logic to ignore days with fewer than N trips (use N=10) to avoid spurious flags.
HardSystem Design
0 practiced
Describe how to implement column-level end-to-end data lineage for derived metrics that combine multiple sources and SQL transformations. Explain the instrumentation required in ETL frameworks, how to parse SQL to infer column mappings, the storage format for lineage metadata, and how to expose lineage to analysts for impact analysis and debugging.
HardTechnical
0 practiced
Design a storage lifecycle and compression strategy to reduce costs for storing raw ride event data while preserving the ability to re-compute derived datasets. Recommend compression formats (Parquet/ORC), partitioning strategies (ingestion_date), columnar vs row storage choices, hot/cold tiering and archival, and retrieval/rehydration patterns for infrequent recomputation requests.

Unlock Full Question Bank

Get access to hundreds of Lyft-Specific Data Modeling & Analytics Requirements interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.