Airbnb-Specific Data Patterns Questions
Domain-specific data modeling and analytics patterns used in Airbnb-scale product analytics. Covers data schema design, event and transaction patterns, feature engineering templates for predictive models, cohort and lifecycle analytics, geospatial and temporal data patterns, price and demand forecasting signals, AB testing data patterns, and data quality, governance, and lineage considerations relevant to Airbnb data.
EasyTechnical
0 practiced
What event fields and metadata would you attach to every event to support robust A/B experimentation at Airbnb? Consider assignment_id, experiment_id, treatment_label, exposure_timestamp, bucketing_seed, override_flags, and impression vs conversion markers. Explain how these fields help ensure unbiased experiment analysis.
MediumTechnical
0 practiced
Given listings(listing_id, lat FLOAT, lon FLOAT, city STRING) and bookings(booking_id, listing_id, occurred_at TIMESTAMP), write SQL (Postgres/PostGIS or BigQuery) to compute conversion rate per 1km grid cell within a specified city and return the top-3 hotspot cells by conversion rate, requiring at least 50 listings per cell. Describe how you generate the grid and join listings to cells.
MediumTechnical
0 practiced
Describe how you would implement deduplication and late-arrival handling for booking events using Kafka as the source and Spark Structured Streaming as the processor. Include watermark settings, state TTLs, idempotent writes to sinks (e.g., Delta), and strategies when required reprocessing window exceeds the watermark.
MediumTechnical
0 practiced
For a 'similar listings' recommendation feature, outline a data engineering pipeline that builds feature vectors combining geospatial, price, amenity, and textual embedding features. Include offline batch computation, feature storage format, ANN index strategy (e.g., Faiss/Annoy), update cadence, and considerations for online serving and freshness.
MediumSystem Design
0 practiced
Design an event ingestion pipeline that can handle up to 1M events/sec globally with end-to-end ingestion latency under 5 seconds for analytics. Specify components (client SDK, edge ingestion, message broker, stream processing, schema registry, storage), how you'd handle schema evolution, idempotency, backpressure, and region-aware routing. What trade-offs would you make to control cost and ensure reliability?
Unlock Full Question Bank
Get access to hundreds of Airbnb-Specific Data Patterns interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.