Data Engineering & Analytics Infrastructure Topics
Data pipeline design, ETL/ELT processes, streaming architectures, data warehousing infrastructure, analytics platform design, and real-time data processing. Covers event-driven systems, batch and streaming trade-offs, data quality and governance at scale, schema design for analytics, and infrastructure for big data processing. Distinct from Data Science & Analytics (which focuses on statistical analysis and insights) and from Cloud & Infrastructure (platform-focused rather than data-flow focused).
Data Observability and Governance
Encompasses designing monitoring, alerting, governance, and metadata practices to maintain long term data reliability. Topics include building observability for data pipelines with logging metrics and traces, setting service level agreements and data quality service level indicators, anomaly detection for data and metrics, automated validation and alerting, lineage and provenance tracking, metadata and cataloging, data contracts, access controls for sensitive data, and processes for governance and compliance. Candidates should be able to design end to end frameworks that combine validation checks, anomaly detection, monitoring dashboards, incident workflows, and documentation to ensure trust in data products.
Data Governance and Classification
Covers frameworks and practices for classifying and governing organizational data. Candidates should demonstrate how to define classification schemes such as public, internal, confidential, and restricted and how to identify sensitivity categories including personally identifiable information, protected health information, payment and financial data, biometric data, and location data. Explain how classification drives data handling requirements including storage location choices, encryption and access control policies, retention and deletion schedules, data minimization, and cross border data handling. Discuss implementation patterns such as metadata and labeling, automated discovery and classification, integration with data pipelines and applications, policy enforcement and auditing, roles and responsibilities for data stewardship, and how to align classification with legal and regulatory compliance and privacy requirements.