Instructions to use MBZUAI/MediX-R1-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MBZUAI/MediX-R1-2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="MBZUAI/MediX-R1-2B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("MBZUAI/MediX-R1-2B")
model = AutoModelForImageTextToText.from_pretrained("MBZUAI/MediX-R1-2B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MBZUAI/MediX-R1-2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MBZUAI/MediX-R1-2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MBZUAI/MediX-R1-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MBZUAI/MediX-R1-2B

SGLang

How to use MBZUAI/MediX-R1-2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MBZUAI/MediX-R1-2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MBZUAI/MediX-R1-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MBZUAI/MediX-R1-2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MBZUAI/MediX-R1-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use MBZUAI/MediX-R1-2B with Docker Model Runner:
```
docker model run hf.co/MBZUAI/MediX-R1-2B
```

jazeelmohd commited on Feb 27

Commit

443bc56

verified ·

1 Parent(s): 3e2225d

Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

.gitattributes +4 -0
README.md +143 -0
assets/logo_black_no_bg.png +0 -0
assets/medix-r1_arch.png +3 -0
assets/microscopy_qualitative.png +3 -0
assets/reward_design_graph.png +3 -0
assets/xray_qualitative.png +3 -0

.gitattributes CHANGED Viewed

@@ -34,3 +34,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
+assets/medix-r1_arch.png filter=lfs diff=lfs merge=lfs -text
+assets/microscopy_qualitative.png filter=lfs diff=lfs merge=lfs -text
+assets/reward_design_graph.png filter=lfs diff=lfs merge=lfs -text
+assets/xray_qualitative.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,143 @@

+# MediX-R1: Open-Ended Medical Reinforcement Learning
+<p align="center">
+    <img src="assets/logo_black_no_bg.png" alt="MediX-R1" width="200">
+</p>
+<p align="center">
+    <img src="https://i.imgur.com/waxVImv.png" alt="MediX-R1">
+</p>
+#### [Sahal Shaji Mullappilly](https://scholar.google.com/citations?user=LJWxVpUAAAAJ&hl=en)\*, [Mohammed Irfan K](https://scholar.google.com/citations?user=GJp0keYAAAAJ&hl=en)\*, [Omair Mohamed](https://scholar.google.com), [Mohamed Zidan](https://scholar.google.com), [Fahad Khan](https://sites.google.com/view/fahadkhans/home), [Salman Khan](https://salman-h-khan.github.io/), [Rao Muhammad Anwer](https://scholar.google.com/citations?hl=en&authuser=1&user=_KlvMVoAAAAJ), and [Hisham Cholakkal](https://scholar.google.com/citations?hl=en&user=bZ3YBRcAAAAJ)
+\**Equally contributing first authors*
+#### **Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE**
+[![Website](https://img.shields.io/badge/Project-Website-87CEEB)](https://medix.cvmbzuai.com)
+[![Paper](https://img.shields.io/badge/arXiv-Paper-red.svg)](#)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace-Page-F9D371)](https://huggingface.co/collections/MBZUAI/medix-r1)
+[![Leaderboard](https://img.shields.io/badge/MediX-Leaderboard-green)](https://medix.cvmbzuai.com/leaderboard)
+---
+## Overview
+MediX-R1 is an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes vision-language backbones with Group-Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward, a medical embedding-based semantic reward, and lightweight format and modality rewards that enforce interpretable reasoning.
+Despite using only ~50K instruction examples, MediX-R1 achieves excellent results across standard medical LLM and VLM benchmarks, outperforming strong open-source baselines.
+**Highlights:**
+- Our **8B** model achieves an overall average of **68.8%**, outperforming the much larger 27B MedGemma (68.4%).
+- Our **30B** model achieves the best overall score of **73.6%**, demonstrating the effectiveness of our composite reward design.
+---
+## Contributions
+- We introduce an **open-ended RL framework** for medical MLLMs that produces clinically grounded, free-form answers beyond MCQ formats.
+- We design a **composite reward** combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition, providing stable and informative feedback where traditional verifiable or MCQ-only rewards fall short.
+- We propose a **unified evaluation framework** for both text-only and image+text tasks using a Reference-based LLM-as-judge, capturing semantic correctness, reasoning, and contextual alignment.
+- Despite using only **~50K** instruction examples, MediX-R1 achieves state-of-the-art results across diverse medical LLM and VLM benchmarks, with particularly large gains on open-ended clinical tasks.
+---
+## Architecture
+<p align="center">
+  <img src="assets/medix-r1_arch.png" alt="MediX-R1 Architecture" width="100%">
+</p>
+---
+## Composite Reward Design
+MediX-R1 uses a multi-signal reward combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition. This stabilizes training and prevents reward hacking compared to single-signal approaches.
+<p align="center">
+  <img src="assets/reward_design_graph.png" alt="Reward Design" width="60%">
+</p>
+---
+## Qualitative Examples
+<p align="center">
+  <img src="assets/microscopy_qualitative.png" alt="Microscopy Example" width="85%">
+  <img src="assets/xray_qualitative.png" alt="X-ray Example" width="85%">
+</p>
+---
+## Training
+We provide training configs for all model sizes using GRPO and DAPO algorithms. The training pipeline uses a vLLM-based reward server for LLM-as-judge scoring during RL training.
+```bash
+cd training
+pip install -e .
+bash vllm_serve.sh       # Step 1: Start the reward server
+bash run_train.sh        # Step 2: Launch RL training
+bash merge_model.sh      # Step 3: Merge FSDP checkpoints
+```
+Training data: [MBZUAI/medix-rl-data](https://huggingface.co/datasets/MBZUAI/medix-rl-data) (~51K train, ~2.5K test samples)
+See [`training/README.md`](training/README.md) for detailed setup, configuration options, and per-model scripts.
+## Evaluation
+We propose a unified evaluation framework for both text-only (LLM) and image+text (VLM) tasks using a Reference-based LLM-as-judge across 17 medical benchmarks.
+```bash
+cd eval
+pip install uv && uv pip install -r requirements.txt
+bash eval.sh             # Run all phases: generate, evaluate, score
+```
+Supports self-hosted judge models via vLLM or [OpenRouter](https://openrouter.ai/) as a remote alternative. Results can be submitted to the [MediX Leaderboard](https://medix.cvmbzuai.com/leaderboard).
+See [`eval/README.md`](eval/README.md) for task selection, CLI reference, and MMMU-Medical evaluation.
+---
+## Model Zoo
+| Model | HuggingFace |
+|-------|-------------|
+| MediX-R1-2B | [MBZUAI/MediX-R1-2B](https://huggingface.co/MBZUAI/MediX-R1-2B) |
+| MediX-R1-8B | [MBZUAI/MediX-R1-8B](https://huggingface.co/MBZUAI/MediX-R1-8B) |
+| MediX-R1-30B | [MBZUAI/MediX-R1-30B](https://huggingface.co/MBZUAI/MediX-R1-30B) |
+---
+## Citation
+If you use MediX-R1 in your research, please cite our work as follows:
+```bibtex
+@misc{mullappilly2025medixr1,
+  title     = {MediX-R1: Open-Ended Medical Reinforcement Learning},
+  author    = {Sahal Shaji Mullappilly and Mohammed Irfan Kurpath and Omair Mohamed and Mohamed Zidan and Fahad Khan and Salman Khan and Rao Anwer and Hisham Cholakkal},
+  year      = {2025},
+  howpublished = {\url{https://github.com/mbzuai-oryx/MediX-R1}}
+}
+```
+---
+## License
+This project is released for **research purposes only** under [*CC-BY-NC-SA 4.0*](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.en) License. It is not intended for clinical or commercial use.
+Users are urged to employ MediX-R1 responsibly, especially when applying its outputs in real-world medical scenarios. It is imperative to verify the model's advice with qualified healthcare professionals and not rely on it for medical diagnoses or treatment decisions.
+---
+## Acknowledgements
+We are thankful to [EasyR1](https://github.com/hiyouga/EasyR1) (a fork of [veRL](https://github.com/volcengine/verl)) for their open-source RL training framework.
+This work was partially supported with  *NVIDIA Academic Grant 2025* and *MBZUAI-IITD* Research Collaboration Seed Grant.
+We are grateful to [MBZUAI](https://mbzuai.ac.ae/) for compute and support.