Open to senior data science roles
United States · Remote-friendly

Data Scientist
shipping ML systems
that move the needle.

Currently
Data Scientist, AI · AIG

Five years architecting and deploying machine learning across healthcare and enterprise — predictive modeling, transformer-based NLP, and end-to-end MLOps. I build models that make it past the notebook.

01 · About

I work the last mile of machine learning — where models meet production, regulation, and people.

I'm a data scientist with five years of experience building ML systems in regulated, high-stakes domains — primarily healthcare claims, clinical text, and fraud analytics. My day-to-day is somewhere between research and engineering: framing the problem, shaping the data, picking the right model, and making sure it lands in production with monitoring, audit trails, and humans who actually trust it.

I lean toward gradient boosting and transformer architectures, with deep practice in PySpark and Snowflake at terabyte scale. I care a lot about measurement — the gap between offline metrics and business outcomes is where most projects quietly die, and I try to keep mine honest.

02 · Experience

A short tour through the systems I've built and shipped.

Jan 2024 — Present

Data Scientist, AI — AIG

Healthcare risk · Clinical NLP · Fraud analytics
  • Led gradient boosting risk-adjustment models on claims and clinical data, lifting scoring precision +28% for value-based care underwriting.
  • Orchestrated PySpark/Snowflake pipelines across 2 TB/day of healthcare data, cutting feature engineering latency −37%.
  • Designed transformer-based NLP to extract structured medical entities from unstructured clinical notes, boosting accuracy +26%.
  • Established MLflow + CI/CD experimentation flow, shortening model deployment timelines −43% with full audit traceability.
  • Built unsupervised anomaly detection for fraud, raising detection precision +24% on large-scale claim systems.
  • Containerized inference with Docker / Kubernetes for batch and real-time prediction across cloud environments.
Python · PySpark Snowflake Transformers · MLflow Docker · K8s · AWS
Jan 2019 — Jul 2022

Machine Learning Engineer — Adons Softech

Recommenders · Forecasting · Data platform
  • Built collaborative-filtering & ranking recommenders, lifting customer engagement +25% across high-traffic platforms.
  • Constructed Spark/Hadoop pipelines on 600 GB+ datasets, raising transformation throughput +39%.
  • Created ensemble churn-prediction classifiers, improving targeting accuracy +27% for retention campaigns.
  • Engineered time-series forecasting for revenue, lifting forecast accuracy +33% for finance and ops planning.
  • Automated ingestion / validation / preprocessing in Python, improving data reliability +44% downstream.
  • Delivered Power BI and Plotly dashboards that accelerated reporting cycles across business units.
Python · Spark · Hadoop XGBoost · LightGBM Power BI · Plotly SQL · ETL
03 · Selected work

Four problems and the numbers behind them.

Case 01 · Healthcare

Risk adjustment for value-based care

Gradient-boosted ensembles on claims + clinical features replaced a brittle linear scoring system. Calibration and reason codes made the model usable by underwriters, not just data teams.

+28%scoring precision
XGBoostSHAPSnowflakeCalibration
Case 02 · Clinical NLP

Extracting structure from unstructured notes

Domain-tuned transformer NER over physician notes pulled out diagnoses, procedures and meds for downstream decision support. Active learning loop kept labelling cost in check.

+26%extraction accuracy
TransformersNERActive learningPyTorch
Case 03 · Fraud analytics

Anomaly detection for billing irregularities

Unsupervised models surfaced suspicious billing patterns across millions of claim lines. Tiered-confidence routing kept investigator workload realistic while raising true-positive rate.

+24%detection precision
Isolation ForestAutoencodersPySpark
Case 04 · MLOps

From notebook to production, faster

MLflow experimentation + CI/CD for training and Docker/K8s for serving. Rebuilt the path from research model to audited deployment so the team could ship without breaking compliance.

−43%deploy timeline
MLflowDockerKubernetesCI/CD
04 · Toolkit

What I reach for, organized by where it lives.

Languages
PythonRSQL
ML / DL frameworks
PyTorchTensorFlowScikit-learnXGBoostLightGBM
Data & big data
PandasNumPyPySparkDaskHadoopHive
NLP / DL
TransformersLLMsNERCNNLSTM
Databases
PostgreSQLMySQLMongoDBSnowflake
MLOps
MLflowDockerKubernetesCI/CDModel monitoringREST APIs
Cloud
AWS (S3, EC2, SageMaker)Azure MLGCP (Vertex AI, BigQuery)
Statistics
Hypothesis testingTime-series forecastingBayesian inference
Visualization
TableauPower BIPlotlyMatplotlibSeaborn
Practice
Feature engineeringHyperparameter optimizationSupervised / unsupervised
05 · Background

Education and a few certifications.

Education

M.S., Data Science
The University of Texas at Arlington
Aug 2022 — May 2024
B.Tech, Information Technology
Gujarat Technological University, India
Jun 2015 — May 2019
06 · Contact

Let's build
something real.
Say hello →

I'm currently open to senior data science and applied ML roles — especially in healthcare, fintech, or anywhere with hard, regulated data problems.