End to end topic covering the precise definition, computation, transformation, implementation, validation, documentation, and monitoring of business metrics. Candidates should demonstrate how to translate business requirements into reproducible metric definitions and formulas, choose aggregation methods and time windows, set filtering and deduplication rules, convert event level data to user level metrics, and compute cohorts, retention, attribution, and incremental impact. The work includes data transformation skills such as normalizing and formatting date and identifier fields, handling null values and edge cases, creating calculated fields and measures, combining and grouping tables at appropriate levels, and choosing between percentages and absolute numbers. Implementation details include writing reliable structured query language code or scripts, selecting instrumentation and data sources, considering aggregation strategy, sampling and margin of error, and ensuring pipelines produce reproducible results. Validation and quality practices include spot checks, comparison to known totals, automated tests, monitoring and alerting, naming conventions and versioning, and clear documentation so all calculations are auditable and maintainable.
MediumTechnical
0 practiced
You need to estimate incremental impact from an A/B test using a difference-in-differences (DID) approach because randomization was imperfect. Outline the assumptions required for DID to be valid, provide the DID formula, and describe a concrete SQL or pseudocode implementation to compute the incremental lift and its confidence interval.
EasyTechnical
0 practiced
Define the difference between event-level and user-level aggregation. Provide examples of metrics that should be computed at event-level and those that must be aggregated to user-level (e.g., conversion-rate vs average-sessions-per-user). Explain pitfalls when mixing aggregation levels in the same report.
MediumTechnical
0 practiced
Write a SQL query (specify dialect) that computes cohort LTV (cumulative revenue per user) for each signup cohort for the first 90 days post signup, using orders table(order_id, user_id, amount, order_time) and signups table(user_id, signup_time). Include handling for refunds (negative amounts) and explain how to handle users with missing signup_time.
EasyTechnical
0 practiced
Given the following events table schema in PostgreSQL:events(event_id UUID PK, user_id UUID NULL, order_id UUID NULL, event_type TEXT, occurred_at TIMESTAMP WITH TIME ZONE, device_id TEXT NULL, is_bot BOOLEAN DEFAULT FALSE)Write a SQL query (Postgres) that computes each user's number of unique purchases in the previous 30 days as of a provided reference_date parameter. Requirements: deduplicate by order_id when present, coalesce device and user identifiers when user_id is NULL, ignore events flagged as bots, and treat null order_id as separate purchases only if event_type='purchase' and there is no order_id available. Return user_id, purchases_30d.
MediumTechnical
0 practiced
You have event data with multiple identity columns: user_id (nullable), device_id, cookie_id. Write SQL (your choice of SQL dialect) to create a canonical identifier column `canonical_id` applying deterministic rules: prefer user_id if present, else choose device_id, else cookie_id. Then write logic to deduplicate sessions per canonical_id within 30 minutes of inactivity. Provide assumptions and sample SQL implementation.
Unlock Full Question Bank
Get access to hundreds of Metric Definition and Implementation interview questions and detailed answers.