Ojasvi Shelar
Data Engineer · Databricks · SAP · Python · PySpark

Data Engineer with 3+ years of experience across logistics and aviation, building scalable data pipelines and data quality frameworks. Hands-on experience with Python, SQL, and PySpark for large-scale data processing and validation across legacy enterprise systems (SAP HANA, BW) and modern cloud environments (Databricks). Focused on data reliability, robust analytics workflows, and lakehouse architectures.

Flight Data Engineering Intern — Customer Implementation
Jul - Dec 2025
OpenAirlines · Toulouse, France
  • Built AeroAtsu, a modular Python validation framework automating FDR data quality checks across heterogeneous aircraft datasets — detecting schema inconsistencies, out-of-range values, frozen signals, and spike anomalies before ingestion into SkyBreathe®, reducing manual review effort by 50%.
  • Implemented rule-based validation logic and statistical trend analysis (pandas, NumPy) to enforce data contracts and flag data drift.
  • Developed a Streamlit/Plotly interactive validation app to replace manual data quality sign-off, enabling onboarding validation for Nippon Cargo Airlines and supporting data quality investigations for TAP Air Portugal and Atlas Air.
  • Translated field-level data issues into structured validation rules and functional specifications- working directly with implementation teams across airline onboarding projects.
SAP Data Engineer & Analytics Consultant
Mar 2021 - Jul 2024
Infosys · Client: CMA CGM (Global Shipping & Logistics)
  • Designed and maintained enterprise-scale ETL/ELT pipelines on SAP HANA and BW, processing high-volume logistics data with emphasis on data lineage, transformation traceability, and end-to-end reliability.
  • Developed SQL transformation logic, process monitoring workflows, and analytical data warehouse models underpinning KPI dashboards and governance reporting across CMA CGM's global operations ensuring decision-makers worked from trusted, reconciled data.
  • Contributed to SAP-to-cloud migration — supporting schema alignment, data reconciliation, and cross-environment validation (Snowflake).
  • Provided L2/L3 production support and produced functional and technical documentation covering business rules, data flows, KPI definitions, and transformation logic.
IoT Sensor Data Pipeline — Medallion Architecture on Databricks
GitHub ↗

End-to-end medallion pipeline (Bronze → Silver → Gold) on Databricks using Apache Spark, Delta Live Tables, and Unity Catalog — enforcing data quality at each layer via DLT native expectations. Implemented rolling z-score anomaly detection via Spark window functions to surface sensor degradation patterns. Delivered a 4-page AI/BI monitoring dashboard and a 15-check data quality audit framework for full pipeline observability across runs.

Databricks Delta Live Tables Unity Catalog PySpark Apache Spark Anomaly detection AI/BI dashboard
Advanced Master's — AI & Business Transformation
ISAE-SUPAERO · Toulouse, France
Sep 2024 - Dec 2025
Data Integration, Machine Learning, Big Data Processing, NLP & LLMs, Reinforcement Learning, Data Governance, Analytics applied to industrial performance optimization.
B.E. — Aeronautical Engineering
Shivaji University · India
2016 - 2020