Data Engineer with 3+ years of experience across logistics and aviation, building scalable data pipelines and data quality frameworks. Hands-on experience with Python, SQL, and PySpark for large-scale data processing and validation across legacy enterprise systems (SAP HANA, BW) and modern cloud environments (Databricks). Focused on data reliability, robust analytics workflows, and lakehouse architectures.
- Built AeroAtsu, a modular Python validation framework automating FDR data quality checks across heterogeneous aircraft datasets — detecting schema inconsistencies, out-of-range values, frozen signals, and spike anomalies before ingestion into SkyBreathe®, reducing manual review effort by 50%.
- Implemented rule-based validation logic and statistical trend analysis (pandas, NumPy) to enforce data contracts and flag data drift.
- Developed a Streamlit/Plotly interactive validation app to replace manual data quality sign-off, enabling onboarding validation for Nippon Cargo Airlines and supporting data quality investigations for TAP Air Portugal and Atlas Air.
- Translated field-level data issues into structured validation rules and functional specifications- working directly with implementation teams across airline onboarding projects.
- Designed and maintained enterprise-scale ETL/ELT pipelines on SAP HANA and BW, processing high-volume logistics data with emphasis on data lineage, transformation traceability, and end-to-end reliability.
- Developed SQL transformation logic, process monitoring workflows, and analytical data warehouse models underpinning KPI dashboards and governance reporting across CMA CGM's global operations ensuring decision-makers worked from trusted, reconciled data.
- Contributed to SAP-to-cloud migration — supporting schema alignment, data reconciliation, and cross-environment validation (Snowflake).
- Provided L2/L3 production support and produced functional and technical documentation covering business rules, data flows, KPI definitions, and transformation logic.
End-to-end medallion pipeline (Bronze → Silver → Gold) on Databricks using Apache Spark, Delta Live Tables, and Unity Catalog — enforcing data quality at each layer via DLT native expectations. Implemented rolling z-score anomaly detection via Spark window functions to surface sensor degradation patterns. Delivered a 4-page AI/BI monitoring dashboard and a 15-check data quality audit framework for full pipeline observability across runs.