InterviewStack.io LogoInterviewStack.io

Lyft-Specific Data Modeling & Analytics Requirements Questions

Lyft-specific data modeling and analytics requirements for data platforms, including ride event data, trip-level schemas, driver and rider dimensions, pricing and surge data, geospatial/location data, and analytics needs such as reporting, dashboards, and real-time analytics. Covers analytic schema design (star/snowflake), ETL/ELT patterns, data quality and governance at scale, data lineage, privacy considerations, and integration with the broader data stack (data lake/warehouse, streaming pipelines).

HardSystem Design
75 practiced
You need to maintain global data privacy preferences and consent flags for riders that affect which features can be computed and stored. Architect a permission/consent service integrated into ingestion and downstream pipelines that enforces consent at ingestion, supports retroactive revocation, and provides verifiable audit logs for compliance teams.
MediumTechnical
88 practiced
Given a trips table schema trips(trip_id PK, rider_id, driver_id, start_time TIMESTAMP, end_time TIMESTAMP, start_city TEXT, end_city TEXT, status TEXT, fare_amount NUMERIC), write a Postgres-compatible SQL query to compute average trip duration and 95th percentile duration per start_city for the last 30 days, excluding canceled trips and trips with duration < 30 seconds. Explain assumptions about timezones and null end_time handling.
HardTechnical
66 practiced
Design an ML-ready feature pipeline that computes a rolling driver reliability score (based on cancellations, on-time arrivals, and ratings) in both offline batch for model training and online for serving. Explain state storage choices, how to ensure eventual consistency, handling of late events, and backfill strategies for historical training data.
MediumTechnical
131 practiced
How would you implement end-to-end data lineage for features used in ML training so a model prediction can be traced back to the raw event and ETL transformations? Describe tools, metadata capture points, versioning of transforms, and how you'd present lineage to auditors and model owners.
MediumTechnical
72 practiced
Write a PySpark approach (pseudocode acceptable) to compute, for each driver, the 7-day rolling average earnings per active hour. Store the results in a partitioned Parquet table and describe your partitioning/compaction choices to optimize queries by driver_id and date range filters.

Unlock Full Question Bank

Get access to hundreds of Lyft-Specific Data Modeling & Analytics Requirements interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.