Data Scientist · ML Engineer · Researcher
Hi, I'm Arpit Baranwal
Building ML pipelines, probabilistic models, and applied AI systems that solve real problems from energy research at Fraunhofer ISE to healthcare data at scale.
📍 Freiburg, Germany · Open to opportunities · No sponsorship needed

Presenting my Master's thesis to a global audience
I'll be presenting my research on data-driven spatial risk assessment of indoor gas dispersion at the IEA HPT Project 64 Webinar Flammable Refrigerant Safety.
About
Who I am
I'm a Data Scientist with over 5 years of experience building production ML systems across healthcare, energy research, and cloud infrastructure. Currently based in Freiburg, Germany, I recently completed my Master's in Embedded Systems Engineering (AI specialization) at the University of Freiburg.
At Fraunhofer ISE, I developed probabilistic surrogate models, uncertainty-aware prediction frameworks, and a RAG knowledge system used across three research teams. Before that, I built large-scale healthcare ETL pipelines and serverless AWS architectures at Legato Health Technology.
I care about the full stack from clean data pipelines and rigorous model validation to deployment and stakeholder communication. I'm a published researcher, an active part-time chef, and an event coordinator. I believe good data work requires both technical depth and clear storytelling.
Experience
Where I've worked
Independent Data Engineer & ML Developer
Self-employed
- ▸Built end-to-end Databricks pipeline (Bronze → Silver → Gold) ingesting 3 data sources (POS, weather, operational) processing 6K+ monthly transactions.
- ▸Engineered a KPI dashboard tracking 15+ metrics and a sentiment analysis pipeline across 200+ customer reviews.
- ▸Developing a dish demand forecasting model across 30+ menu items using 12 months of historical order data and weather signals.
Student Research Assistant
Fraunhofer ISE, Freiburg
- ▸Developed ML models for risk prediction and anomaly detection across 3+ physical system configurations integrating 30+ real-world sensor signals.
- ▸Built a probabilistic surrogate model (TabPFN) trained on ~500 simulation samples delivering uncertainty-aware predictions ~100× faster than full physical simulations.
- ▸Designed a Kriging-Monte Carlo prediction framework across 10,000+ iterations, achieving R²=0.81 through rigorous cross-validation.
- ▸Deployed a RAG knowledge system indexing 1000+ internal technical documents across 3 research teams, reducing manual retrieval time by ~30%.
Associate Cloud Intern
RealWorldOne
- ▸Implemented secure, scalable ML infrastructure on AWS, contributing to model deployment workflows across dev and production environments.
- ▸Designed IAM-based access control policies ensuring compliant, role-based data handling across the engineering team.
- ▸Contributed to CI/CD pipelines improving deployment reliability and release consistency for ML services.
AWS Python Developer
Legato Health Technology
- ▸Developed 5+ backend services and data processing APIs in C# .Net, building the foundational healthcare data layer feeding into ML pipelines.
- ▸Engineered 3+ ML-driven ETL pipelines processing 2M+ healthcare records monthly with built-in anomaly detection.
- ▸Built an automated OCR identity-verification system processing thousands of documents monthly, eliminating manual review across 2+ business units.
- ▸Designed 4+ serverless ingestion architectures (AWS Glue, Lambda, AppSync) provisioned via Terraform across 3 environments.
Education
Where I studied
Master's in Embedded Systems Engineering
Specialization in AI
Freiburg, Germany
Coursework in deep learning, reinforcement learning, computer vision, robotics, microsystems engineering, and optimization building strong skills in applied ML and system modeling.
Bachelor's in Electronics & Communication Engineering
Asansol, India
Foundation in signal processing, electronics, and communication systems with a focus on applied engineering and mathematics.
Skills
What I work with
Languages & Databases
ML & Data Libraries
Cloud, MLOps & Tools
Methods & Specializations
Projects
Things I've built
Spatial Risk Assessment for Gas Dispersion
Data-driven spatial risk assessment of indoor R-290 gas dispersion using Kriging Regression and Monte Carlo Uncertainty Analysis. Achieved R²=0.81 across 10,000+ Monte Carlo iterations with full uncertainty quantification.
RAG Knowledge System (Fraunhofer ISE)
Deployed a Retrieval-Augmented Generation system indexing 1000+ internal technical documents across 3 research teams. Reduced manual information retrieval time by ~30% using vector search and LLM-based Q&A.
EEG Signal Analysis (Vision Transformers)
Applied Vision Transformer models to analyze EEG signals, enhancing feature extraction through attention mechanisms. Developed techniques to capture complex temporal and spatial patterns in neural data for improved classification.
Probabilistic Surrogate Model (TabPFN)
Trained a probabilistic surrogate model on ~500 simulation samples delivering uncertainty-aware predictions ~100× faster than full physical simulations. Integrated 30+ real-world sensor signals per physical setup.
Healthcare ETL Pipeline
Engineered 3+ ML-driven ETL pipelines processing 2M+ healthcare records monthly with built-in anomaly detection. Built automated OCR identity-verification eliminating manual review across 2+ business units.
Rocket Landing Control (MPC)
Implemented a nonlinear MPC pipeline using CasADi to stabilize and land a rocket in simulation. Modeled system dynamics from first principles and validated performance through PyMunk physics simulation.
Publications
Research work
Data-Driven Spatial Risk Assessment of Indoor Gas Dispersion using Kriging Regression and Monte Carlo Uncertainty Analysis
IEA Heat Pumping Technologies TCP – Project 64 Annual Report 2026
A Diagnostic Framework for CFD Validation of R-290 Dispersion in an Indoor Heat Pump Installation Room
IIR Gustav Lorentzen Conference on Natural Refrigerants (GL2026), Hamilton, New Zealand
Ensemble Method of Feature Selection using Filter and Wrapper Techniques with Evolutionary Learning
Springer Nature
Chewing Sound Interpretation by Deep Learning
Cosima Conference, Dresden
End-to-end AI-based signal processing pipeline for healthcare diagnostics using sensor-equipped eyewear.
Writing
From my blog
I write about machine learning, data science, and the practical lessons I pick up building real systems. Read the full collection on Medium.
Contact
Let's talk
I'm open to Data Scientist, ML Engineer, and Research roles in Germany and Europe. Feel free to reach out, I respond within 24 hours.
Location
Freiburg, Germany · Open to remote & hybrid