Hi, I'm Arpit Baranwal

Building ML pipelines, probabilistic models, and applied AI systems that solve real problems from energy research at Fraunhofer ISE to healthcare data at scale.

📍 Freiburg, Germany  ·  Open to opportunities  ·  No sponsorship needed

Arpit Baranwal
Upcoming Talk · Jun 25, 2026Global webinar

Presenting my Master's thesis to a global audience

I'll be presenting my research on data-driven spatial risk assessment of indoor gas dispersion at the IEA HPT Project 64 Webinar Flammable Refrigerant Safety.

Who I am

I'm a Data Scientist with over 5 years of experience building production ML systems across healthcare, energy research, and cloud infrastructure. Currently based in Freiburg, Germany, I recently completed my Master's in Embedded Systems Engineering (AI specialization) at the University of Freiburg.

At Fraunhofer ISE, I developed probabilistic surrogate models, uncertainty-aware prediction frameworks, and a RAG knowledge system used across three research teams. Before that, I built large-scale healthcare ETL pipelines and serverless AWS architectures at Legato Health Technology.

I care about the full stack from clean data pipelines and rigorous model validation to deployment and stakeholder communication. I'm a published researcher, an active part-time chef, and an event coordinator. I believe good data work requires both technical depth and clear storytelling.

5+
Years Experience
3
Publications
2M+
Records Processed
R²=0.81
Best Model Score

Where I've worked

Independent Data Engineer & ML Developer

Self-employed

Apr 2026 – Present
DatabricksPySparkPowerBINLPForecasting
  • Built end-to-end Databricks pipeline (Bronze → Silver → Gold) ingesting 3 data sources (POS, weather, operational) processing 6K+ monthly transactions.
  • Engineered a KPI dashboard tracking 15+ metrics and a sentiment analysis pipeline across 200+ customer reviews.
  • Developing a dish demand forecasting model across 30+ menu items using 12 months of historical order data and weather signals.

Student Research Assistant

Fraunhofer ISE, Freiburg

Jun 2024 – Apr 2026
TabPFNKrigingMonte CarloRAGPythonMATLAB
  • Developed ML models for risk prediction and anomaly detection across 3+ physical system configurations integrating 30+ real-world sensor signals.
  • Built a probabilistic surrogate model (TabPFN) trained on ~500 simulation samples delivering uncertainty-aware predictions ~100× faster than full physical simulations.
  • Designed a Kriging-Monte Carlo prediction framework across 10,000+ iterations, achieving R²=0.81 through rigorous cross-validation.
  • Deployed a RAG knowledge system indexing 1000+ internal technical documents across 3 research teams, reducing manual retrieval time by ~30%.

Associate Cloud Intern

RealWorldOne

Sep 2023 – Feb 2024
AWSIAMCI/CDML Deployment
  • Implemented secure, scalable ML infrastructure on AWS, contributing to model deployment workflows across dev and production environments.
  • Designed IAM-based access control policies ensuring compliant, role-based data handling across the engineering team.
  • Contributed to CI/CD pipelines improving deployment reliability and release consistency for ML services.

AWS Python Developer

Legato Health Technology

Jun 2020 – May 2022
C# .NetAWS GlueLambdaAppSyncTerraformOCR
  • Developed 5+ backend services and data processing APIs in C# .Net, building the foundational healthcare data layer feeding into ML pipelines.
  • Engineered 3+ ML-driven ETL pipelines processing 2M+ healthcare records monthly with built-in anomaly detection.
  • Built an automated OCR identity-verification system processing thousands of documents monthly, eliminating manual review across 2+ business units.
  • Designed 4+ serverless ingestion architectures (AWS Glue, Lambda, AppSync) provisioned via Terraform across 3 environments.

Where I studied

Master's in Embedded Systems Engineering

Specialization in AI

Albert Ludwigs University of FreiburgApr 2022 – Mar 2026

Freiburg, Germany

Coursework in deep learning, reinforcement learning, computer vision, robotics, microsystems engineering, and optimization building strong skills in applied ML and system modeling.

Bachelor's in Electronics & Communication Engineering

Asansol Engineering College2016 – 2020

Asansol, India

Foundation in signal processing, electronics, and communication systems with a focus on applied engineering and mathematics.

What I work with

Languages & Databases

PythonPython
C++C++
C#C#
.NET.NET
PostgreSQLPostgreSQL
MySQLMySQL
BashBash

ML & Data Libraries

PyTorchPyTorch
TensorFlowTensorFlow
scikit-learnscikit-learn
PandasPandas
NumPyNumPy
OpenCVOpenCV
MATLABMATLAB

Cloud, MLOps & Tools

AWSAWS
AzureAzure
GCPGCP
DockerDocker
TerraformTerraform
GitGit
JupyterJupyter
GraphQLGraphQL
LinuxLinux

Methods & Specializations

RAGTabPFNTransformersMonte Carlo MethodsKriging RegressionUncertainty QuantificationProbabilistic ModelingAnomaly DetectionTime SeriesPredictive ModelingA/B TestingETL PipelinesDatabricksPySparkPowerBIn8nAWS GlueLambdaAppSyncCI/CD

Things I've built

Published · IEA Heat Pumping TCP 2026

Spatial Risk Assessment for Gas Dispersion

Data-driven spatial risk assessment of indoor R-290 gas dispersion using Kriging Regression and Monte Carlo Uncertainty Analysis. Achieved R²=0.81 across 10,000+ Monte Carlo iterations with full uncertainty quantification.

KrigingMonte CarloPythonMATLABUncertainty Quantification
30% reduction in retrieval time

RAG Knowledge System (Fraunhofer ISE)

Deployed a Retrieval-Augmented Generation system indexing 1000+ internal technical documents across 3 research teams. Reduced manual information retrieval time by ~30% using vector search and LLM-based Q&A.

RAGLLMVector DBPythonNLP

EEG Signal Analysis (Vision Transformers)

Applied Vision Transformer models to analyze EEG signals, enhancing feature extraction through attention mechanisms. Developed techniques to capture complex temporal and spatial patterns in neural data for improved classification.

Vision TransformersPyTorchEEGDeep LearningSignal Processing
100× faster than physical simulation

Probabilistic Surrogate Model (TabPFN)

Trained a probabilistic surrogate model on ~500 simulation samples delivering uncertainty-aware predictions ~100× faster than full physical simulations. Integrated 30+ real-world sensor signals per physical setup.

TabPFNProbabilistic MLUncertaintyPythonSensor Data
2M+ records/month

Healthcare ETL Pipeline

Engineered 3+ ML-driven ETL pipelines processing 2M+ healthcare records monthly with built-in anomaly detection. Built automated OCR identity-verification eliminating manual review across 2+ business units.

AWS GlueLambdaTerraformC# .NetOCRETL

Rocket Landing Control (MPC)

Implemented a nonlinear MPC pipeline using CasADi to stabilize and land a rocket in simulation. Modeled system dynamics from first principles and validated performance through PyMunk physics simulation.

MPCCasADiPythonControl SystemsPyMunk

Research work

2026
Journal / ReportIn Press (Master Thesis)

Data-Driven Spatial Risk Assessment of Indoor Gas Dispersion using Kriging Regression and Monte Carlo Uncertainty Analysis

IEA Heat Pumping Technologies TCP – Project 64 Annual Report 2026

Aug 2026
Conference PaperSubmitted

A Diagnostic Framework for CFD Validation of R-290 Dispersion in an Indoor Heat Pump Installation Room

IIR Gustav Lorentzen Conference on Natural Refrigerants (GL2026), Hamilton, New Zealand

Apr 2022
Journal PaperPublished

Ensemble Method of Feature Selection using Filter and Wrapper Techniques with Evolutionary Learning

Springer Nature

2023
Conference ProceedingPublished

Chewing Sound Interpretation by Deep Learning

Cosima Conference, Dresden

End-to-end AI-based signal processing pipeline for healthcare diagnostics using sensor-equipped eyewear.

Let's talk

I'm open to Data Scientist, ML Engineer, and Research roles in Germany and Europe. Feel free to reach out, I respond within 24 hours.

Location

Freiburg, Germany · Open to remote & hybrid

What I'm looking for

Data Scientist / ML Engineer roles
Research-oriented positions in AI/ML
Projects involving uncertainty quantification or probabilistic models
Teams that value rigorous validation and clean pipelines
Based in Germany open to Europe-wide remote