jazeelmohd commited on
Commit
13e5d98
·
verified ·
1 Parent(s): 72ba731

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -34,3 +34,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ assets/medix-r1_arch.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/microscopy_qualitative.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/reward_design_graph.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/xray_qualitative.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MediX-R1: Open-Ended Medical Reinforcement Learning
2
+
3
+ <p align="center">
4
+ <img src="assets/logo_black_no_bg.png" alt="MediX-R1" width="200">
5
+ </p>
6
+
7
+ <p align="center">
8
+ <img src="https://i.imgur.com/waxVImv.png" alt="MediX-R1">
9
+ </p>
10
+
11
+ #### [Sahal Shaji Mullappilly](https://scholar.google.com/citations?user=LJWxVpUAAAAJ&hl=en)\*, [Mohammed Irfan K](https://scholar.google.com/citations?user=GJp0keYAAAAJ&hl=en)\*, [Omair Mohamed](https://scholar.google.com), [Mohamed Zidan](https://scholar.google.com), [Fahad Khan](https://sites.google.com/view/fahadkhans/home), [Salman Khan](https://salman-h-khan.github.io/), [Rao Muhammad Anwer](https://scholar.google.com/citations?hl=en&authuser=1&user=_KlvMVoAAAAJ), and [Hisham Cholakkal](https://scholar.google.com/citations?hl=en&user=bZ3YBRcAAAAJ)
12
+
13
+ \**Equally contributing first authors*
14
+
15
+ #### **Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE**
16
+
17
+ [![Website](https://img.shields.io/badge/Project-Website-87CEEB)](https://medix.cvmbzuai.com)
18
+ [![Paper](https://img.shields.io/badge/arXiv-Paper-red.svg)](#)
19
+ [![HuggingFace](https://img.shields.io/badge/HuggingFace-Page-F9D371)](https://huggingface.co/collections/MBZUAI/medix-r1)
20
+ [![Leaderboard](https://img.shields.io/badge/MediX-Leaderboard-green)](https://medix.cvmbzuai.com/leaderboard)
21
+
22
+ ---
23
+
24
+ ## Overview
25
+
26
+ MediX-R1 is an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes vision-language backbones with Group-Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward, a medical embedding-based semantic reward, and lightweight format and modality rewards that enforce interpretable reasoning.
27
+
28
+ Despite using only ~50K instruction examples, MediX-R1 achieves excellent results across standard medical LLM and VLM benchmarks, outperforming strong open-source baselines.
29
+
30
+ **Highlights:**
31
+ - Our **8B** model achieves an overall average of **68.8%**, outperforming the much larger 27B MedGemma (68.4%).
32
+ - Our **30B** model achieves the best overall score of **73.6%**, demonstrating the effectiveness of our composite reward design.
33
+
34
+ ---
35
+
36
+ ## Contributions
37
+
38
+ - We introduce an **open-ended RL framework** for medical MLLMs that produces clinically grounded, free-form answers beyond MCQ formats.
39
+ - We design a **composite reward** combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition, providing stable and informative feedback where traditional verifiable or MCQ-only rewards fall short.
40
+ - We propose a **unified evaluation framework** for both text-only and image+text tasks using a Reference-based LLM-as-judge, capturing semantic correctness, reasoning, and contextual alignment.
41
+ - Despite using only **~50K** instruction examples, MediX-R1 achieves state-of-the-art results across diverse medical LLM and VLM benchmarks, with particularly large gains on open-ended clinical tasks.
42
+
43
+ ---
44
+
45
+ ## Architecture
46
+
47
+ <p align="center">
48
+ <img src="assets/medix-r1_arch.png" alt="MediX-R1 Architecture" width="100%">
49
+ </p>
50
+
51
+ ---
52
+
53
+ ## Composite Reward Design
54
+
55
+ MediX-R1 uses a multi-signal reward combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition. This stabilizes training and prevents reward hacking compared to single-signal approaches.
56
+
57
+ <p align="center">
58
+ <img src="assets/reward_design_graph.png" alt="Reward Design" width="60%">
59
+ </p>
60
+
61
+ ---
62
+
63
+ ## Qualitative Examples
64
+
65
+ <p align="center">
66
+ <img src="assets/microscopy_qualitative.png" alt="Microscopy Example" width="85%">
67
+ <img src="assets/xray_qualitative.png" alt="X-ray Example" width="85%">
68
+ </p>
69
+
70
+ ---
71
+
72
+ ## Training
73
+
74
+ We provide training configs for all model sizes using GRPO and DAPO algorithms. The training pipeline uses a vLLM-based reward server for LLM-as-judge scoring during RL training.
75
+
76
+ ```bash
77
+ cd training
78
+ pip install -e .
79
+ bash vllm_serve.sh # Step 1: Start the reward server
80
+ bash run_train.sh # Step 2: Launch RL training
81
+ bash merge_model.sh # Step 3: Merge FSDP checkpoints
82
+ ```
83
+
84
+ Training data: [MBZUAI/medix-rl-data](https://huggingface.co/datasets/MBZUAI/medix-rl-data) (~51K train, ~2.5K test samples)
85
+
86
+ See [`training/README.md`](training/README.md) for detailed setup, configuration options, and per-model scripts.
87
+
88
+ ## Evaluation
89
+
90
+ We propose a unified evaluation framework for both text-only (LLM) and image+text (VLM) tasks using a Reference-based LLM-as-judge across 17 medical benchmarks.
91
+
92
+ ```bash
93
+ cd eval
94
+ pip install uv && uv pip install -r requirements.txt
95
+ bash eval.sh # Run all phases: generate, evaluate, score
96
+ ```
97
+
98
+ Supports self-hosted judge models via vLLM or [OpenRouter](https://openrouter.ai/) as a remote alternative. Results can be submitted to the [MediX Leaderboard](https://medix.cvmbzuai.com/leaderboard).
99
+
100
+ See [`eval/README.md`](eval/README.md) for task selection, CLI reference, and MMMU-Medical evaluation.
101
+
102
+ ---
103
+
104
+ ## Model Zoo
105
+
106
+ | Model | HuggingFace |
107
+ |-------|-------------|
108
+ | MediX-R1-2B | [MBZUAI/MediX-R1-2B](https://huggingface.co/MBZUAI/MediX-R1-2B) |
109
+ | MediX-R1-8B | [MBZUAI/MediX-R1-8B](https://huggingface.co/MBZUAI/MediX-R1-8B) |
110
+ | MediX-R1-30B | [MBZUAI/MediX-R1-30B](https://huggingface.co/MBZUAI/MediX-R1-30B) |
111
+
112
+ ---
113
+
114
+ ## Citation
115
+
116
+ If you use MediX-R1 in your research, please cite our work as follows:
117
+
118
+ ```bibtex
119
+ @misc{mullappilly2025medixr1,
120
+ title = {MediX-R1: Open-Ended Medical Reinforcement Learning},
121
+ author = {Sahal Shaji Mullappilly and Mohammed Irfan Kurpath and Omair Mohamed and Mohamed Zidan and Fahad Khan and Salman Khan and Rao Anwer and Hisham Cholakkal},
122
+ year = {2025},
123
+ howpublished = {\url{https://github.com/mbzuai-oryx/MediX-R1}}
124
+ }
125
+ ```
126
+
127
+ ---
128
+
129
+ ## License
130
+
131
+ This project is released for **research purposes only** under [*CC-BY-NC-SA 4.0*](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.en) License. It is not intended for clinical or commercial use.
132
+
133
+ Users are urged to employ MediX-R1 responsibly, especially when applying its outputs in real-world medical scenarios. It is imperative to verify the model's advice with qualified healthcare professionals and not rely on it for medical diagnoses or treatment decisions.
134
+
135
+ ---
136
+
137
+ ## Acknowledgements
138
+
139
+ We are thankful to [EasyR1](https://github.com/hiyouga/EasyR1) (a fork of [veRL](https://github.com/volcengine/verl)) for their open-source RL training framework.
140
+
141
+ This work was partially supported with *NVIDIA Academic Grant 2025* and *MBZUAI-IITD* Research Collaboration Seed Grant.
142
+
143
+ We are grateful to [MBZUAI](https://mbzuai.ac.ae/) for compute and support.
assets/logo_black_no_bg.png ADDED
assets/medix-r1_arch.png ADDED

Git LFS Details

  • SHA256: c0f3d730d9ba0edfe5d28ece7aaf9660b4645d8bf7d616aced497ba9e3afc2d7
  • Pointer size: 131 Bytes
  • Size of remote file: 407 kB
assets/microscopy_qualitative.png ADDED

Git LFS Details

  • SHA256: 76debe9209a726d3ea4d2ae7c14e478474d3fea2e568ec03724f0935f064b2ff
  • Pointer size: 132 Bytes
  • Size of remote file: 2.34 MB
assets/reward_design_graph.png ADDED

Git LFS Details

  • SHA256: 90b55ae964d8a3020b77c0816841046a0753560d72cfa1a626f30484cdb056d7
  • Pointer size: 131 Bytes
  • Size of remote file: 104 kB
assets/xray_qualitative.png ADDED

Git LFS Details

  • SHA256: 83ffba4616b216208a5ff06d64a62a4a57c8acfda55f1e3337416bc90654d1df
  • Pointer size: 131 Bytes
  • Size of remote file: 692 kB