Feature Engineering and Feature Stores Questions

Designing, building, and operating feature engineering pipelines and feature store platforms that enable large scale machine learning. Core skills include feature design and selection, offline and online feature computation, batch versus real time ingestion and serving, storage and serving architectures, client libraries and serving APIs, materialization strategies and caching, and ensuring consistent feature semantics and training to serving consistency. Candidates should understand feature freshness and staleness tradeoffs, feature versioning and lineage, dependency graphs for feature computation, cost aware and incremental computation strategies, and techniques to prevent label leakage and data leakage. At scale this also covers lifecycle management for thousands to millions of features, orchestration and scheduling, validation and quality gates for features, monitoring and observability of feature pipelines, and metadata governance, discoverability, and access control. For senior and staff levels, evaluate platform design across multiple teams including feature reuse and sharing, feature catalogs and discoverability, handling metric collision and naming collisions, data governance and auditability, service level objectives and guarantees for serving and materialization, client library and API design, feature promotion and versioning workflows, and compliance and privacy considerations.

MediumTechnical

0 practiced

Write a PostgreSQL query that computes a per-user z-score for daily purchase amount using a 90-day window. Given table `transactions(user_id, amount, occurred_at)`, output (user_id, occurred_date, amount_zscore_90d). Explain how you handle users with fewer than 2 days of history.

HardTechnical

0 practiced

Design a strategy to detect and reconcile metric collisions when different teams publish similarly named metrics (e.g., 'monthly_active_users') but with different definitions. Include detection algorithms, human-in-the-loop reconciliation, and automated mapping or aliasing approaches.

MediumTechnical

0 practiced

A feature's schema changes upstream (column renamed, dtype changed). Describe a robust process to handle schema evolution in the feature store so that production models are not broken. Include schema migration steps, compatibility checks, and rollout strategy.

HardTechnical

0 practiced

Design a large-scale automated validation and QA framework that prevents label leakage and data quality regressions when teams submit new or modified features. Include unit/integration tests, sample-based checks, schema checks, model shadowing, and promotion gates.

EasyTechnical

0 practiced

Write a SQL query (compatible with PostgreSQL) to compute a sliding 7-day average 'daily_spend_avg_7d' per user from a `transactions` table (transaction_id, user_id, amount, occurred_at TIMESTAMP). The result should have columns (user_id, occurred_date, daily_spend_avg_7d) and exclude the current day from the 7-day window.

Unlock Full Question Bank

Get access to hundreds of Feature Engineering and Feature Stores interview questions and detailed answers.

Join thousands of developers preparing for their dream job.