Instructions to use MBZUAI/MediX-R1-8B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MBZUAI/MediX-R1-8B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MBZUAI/MediX-R1-8B-GGUF",
	filename="MediX-R1-8B-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use MBZUAI/MediX-R1-8B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use MBZUAI/MediX-R1-8B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MBZUAI/MediX-R1-8B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MBZUAI/MediX-R1-8B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Ollama
How to use MBZUAI/MediX-R1-8B-GGUF with Ollama:
```
ollama run hf.co/MBZUAI/MediX-R1-8B-GGUF:Q4_K_M
```

Unsloth Studio new

How to use MBZUAI/MediX-R1-8B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MBZUAI/MediX-R1-8B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MBZUAI/MediX-R1-8B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MBZUAI/MediX-R1-8B-GGUF to start chatting

Pi new

How to use MBZUAI/MediX-R1-8B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MBZUAI/MediX-R1-8B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MBZUAI/MediX-R1-8B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use MBZUAI/MediX-R1-8B-GGUF with Docker Model Runner:
```
docker model run hf.co/MBZUAI/MediX-R1-8B-GGUF:Q4_K_M
```

Lemonade

How to use MBZUAI/MediX-R1-8B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MBZUAI/MediX-R1-8B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.MediX-R1-8B-GGUF-Q4_K_M

List all available models

lemonade list

jazeelmohd commited on Feb 27

Commit

f0cf9fd

verified ·

1 Parent(s): d198250

Add files using upload-large-folder tool

Browse files

Files changed (28) hide show

.gitattributes +24 -0
MediX-R1-8B-F16.gguf +3 -0
MediX-R1-8B-IQ3_M.gguf +3 -0
MediX-R1-8B-IQ3_S.gguf +3 -0
MediX-R1-8B-IQ4_NL.gguf +3 -0
MediX-R1-8B-IQ4_XS.gguf +3 -0
MediX-R1-8B-Q2_K.gguf +3 -0
MediX-R1-8B-Q3_K_L.gguf +3 -0
MediX-R1-8B-Q3_K_M.gguf +3 -0
MediX-R1-8B-Q3_K_S.gguf +3 -0
MediX-R1-8B-Q4_0.gguf +3 -0
MediX-R1-8B-Q4_1.gguf +3 -0
MediX-R1-8B-Q4_K_M.gguf +3 -0
MediX-R1-8B-Q4_K_S.gguf +3 -0
MediX-R1-8B-Q5_0.gguf +3 -0
MediX-R1-8B-Q5_1.gguf +3 -0
MediX-R1-8B-Q5_K_M.gguf +3 -0
MediX-R1-8B-Q5_K_S.gguf +3 -0
MediX-R1-8B-Q6_K.gguf +3 -0
MediX-R1-8B-Q8_0.gguf +3 -0
README.md +159 -0
assets/logo_black_no_bg.png +0 -0
assets/logo_white_no_bg.png +0 -0
assets/medix-r1_arch.png +3 -0
assets/microscopy_qualitative.png +3 -0
assets/reward_design_graph.png +3 -0
assets/xray_qualitative.png +3 -0
mmproj-MediX-R1-8b-F16.gguf +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,27 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
+mmproj-MediX-R1-8b-F16.gguf filter=lfs diff=lfs merge=lfs -text
+assets/medix-r1_arch.png filter=lfs diff=lfs merge=lfs -text
+assets/reward_design_graph.png filter=lfs diff=lfs merge=lfs -text
+assets/microscopy_qualitative.png filter=lfs diff=lfs merge=lfs -text
+assets/xray_qualitative.png filter=lfs diff=lfs merge=lfs -text
+MediX-R1-8B-F16.gguf filter=lfs diff=lfs merge=lfs -text

MediX-R1-8B-F16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:71bfba6d155e3f7a9eb90ea04900389b6b274da464c59143b46769bcaa2cb6c5
+size 16388045088

MediX-R1-8B-IQ3_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7157ae1d306b9a18d9bc24279d85645c9c671668176efeb6b4dfe1f8080a9113
+size 3896621088

MediX-R1-8B-IQ3_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:756eae053820440015fcbae6ca01c21d3edd95e8f90db02d41177918f00dc2c8
+size 3789666336

MediX-R1-8B-IQ4_NL.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a13c9fa58b68b7999a1f042a872878505337f724e5bfb64abb23aee21ad01569
+size 4818790432

MediX-R1-8B-IQ4_XS.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66aff9e4de2b69f13c6afec950c035a10232aaa7759386a81c055768179f2afe
+size 4593297440

MediX-R1-8B-Q2_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e0e811b7a4b0315e6eeeeb81bb413c204ea143e934058157854e8313fe03e1b2
+size 3281733664

MediX-R1-8B-Q3_K_L.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e1ba52e4aaa6e58a628e1c60e8c69d91a9defbc68c030316b54df75cead6393
+size 4431394848

MediX-R1-8B-Q3_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:256937adf103049202d2d5c86d90c74dfcdbfc307c536db61573677c401aa74c
+size 4124162080

MediX-R1-8B-Q3_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:666c38e6ffdbf29dc0b290696b87d8455487420fc6895cd662e27b19f288b18b
+size 3769612320

MediX-R1-8B-Q4_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab882436f70e8365178e817b1dc6a79426137ae14f0370bc214c240080065670
+size 4774750240

MediX-R1-8B-Q4_1.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cfc54373889d37ac15b5fa3ffe7bc1681dc0db804cc7a737eead7ec66bbc5720
+size 5247756320

MediX-R1-8B-Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c6728943aa60f090e6396159e5064d3290d03c05060e3129b9271f474f0d7e88
+size 5027784736

MediX-R1-8B-Q4_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5df976fe40738153d00173f1998e68369bba25c2c0f2754c48028c722bfe1641
+size 4802013216

MediX-R1-8B-Q5_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d8930e32fdc54b1f42125cd68a8f932c0beae81b3d7942651a049b3544a7d6c
+size 5720762400

MediX-R1-8B-Q5_1.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:62b5b20bb7436d64028f17eed91c0d7e072a73aaef988644033f6d38fb0254ce
+size 6193768480

MediX-R1-8B-Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bf6849c38f39fdd341954aab94c2f415dea063df4ab56ac65c2193afa2a6afc
+size 5851113504

MediX-R1-8B-Q5_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8c8bf5edfa53c981a99d6374204cea396a14980453a8d2acd11449ee3392a9a
+size 5720762400

MediX-R1-8B-Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4096c7bfa2ce29876be5765e4552a4de16d87afaa93ff41f9a7e5103f1b4241
+size 6725900320

MediX-R1-8B-Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2245342474d96a124e07d6590492a3f946712745331a9174bcde526d2c094485
+size 8709519392

README.md ADDED Viewed

	@@ -0,0 +1,159 @@

+---
+license: cc-by-nc-sa-4.0
+language:
+  - en
+tags:
+  - medical
+  - reinforcement-learning
+  - multimodal
+  - vision-language
+  - qwen3-vl
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+# MediX-R1: Open-Ended Medical Reinforcement Learning
+<p align="center">
+    <img src="assets/logo_white_no_bg.png" alt="MediX-R1" width="200">
+</p>
+<p align="center">
+    <img src="https://i.imgur.com/waxVImv.png" alt="MediX-R1">
+</p>
+#### [Sahal Shaji Mullappilly](https://scholar.google.com/citations?user=LJWxVpUAAAAJ&hl=en)\*, [Mohammed Irfan K](https://scholar.google.com/citations?user=GJp0keYAAAAJ&hl=en)\*, [Omair Mohamed](https://scholar.google.com), [Mohamed Zidan](https://scholar.google.com), [Fahad Khan](https://sites.google.com/view/fahadkhans/home), [Salman Khan](https://salman-h-khan.github.io/), [Rao Muhammad Anwer](https://scholar.google.com/citations?hl=en&authuser=1&user=_KlvMVoAAAAJ), and [Hisham Cholakkal](https://scholar.google.com/citations?hl=en&user=bZ3YBRcAAAAJ)
+\**Equally contributing first authors*
+#### **Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE**
+[![Website](https://img.shields.io/badge/Project-Website-87CEEB)](https://medix.cvmbzuai.com)
+[![Paper](https://img.shields.io/badge/arXiv-Paper-red.svg)](https://arxiv.org/pdf/2602.23363)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace-Page-F9D371)](https://huggingface.co/collections/MBZUAI/medix-r1)
+[![Leaderboard](https://img.shields.io/badge/MediX-Leaderboard-green)](https://medix.cvmbzuai.com/leaderboard)
+---
+## Overview
+MediX-R1 is an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes vision-language backbones with Group-Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward, a medical embedding-based semantic reward, and lightweight format and modality rewards that enforce interpretable reasoning.
+Despite using only ~50K instruction examples, MediX-R1 achieves excellent results across standard medical LLM and VLM benchmarks, outperforming strong open-source baselines.
+**Highlights:**
+- Our **8B** model achieves an overall average of **68.8%**, outperforming the much larger 27B MedGemma (68.4%).
+- Our **30B** model achieves the best overall score of **73.6%**, demonstrating the effectiveness of our composite reward design.
+---
+## Contributions
+- We introduce an **open-ended RL framework** for medical MLLMs that produces clinically grounded, free-form answers beyond MCQ formats.
+- We design a **composite reward** combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition, providing stable and informative feedback where traditional verifiable or MCQ-only rewards fall short.
+- We propose a **unified evaluation framework** for both text-only and image+text tasks using a Reference-based LLM-as-judge, capturing semantic correctness, reasoning, and contextual alignment.
+- Despite using only **~50K** instruction examples, MediX-R1 achieves state-of-the-art results across diverse medical LLM and VLM benchmarks, with particularly large gains on open-ended clinical tasks.
+---
+## Architecture
+<p align="center">
+  <img src="assets/medix-r1_arch.png" alt="MediX-R1 Architecture" width="100%">
+</p>
+---
+## Composite Reward Design
+MediX-R1 uses a multi-signal reward combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition. This stabilizes training and prevents reward hacking compared to single-signal approaches.
+<p align="center">
+  <img src="assets/reward_design_graph.png" alt="Reward Design" width="60%">
+</p>
+---
+## Qualitative Examples
+<p align="center">
+  <img src="assets/microscopy_qualitative.png" alt="Microscopy Example" width="85%">
+  <img src="assets/xray_qualitative.png" alt="X-ray Example" width="85%">
+</p>
+---
+## Training
+We provide training configs for all model sizes using GRPO and DAPO algorithms. The training pipeline uses a vLLM-based reward server for LLM-as-judge scoring during RL training.
+```bash
+cd training
+pip install -e .
+bash vllm_serve.sh       # Step 1: Start the reward server
+bash run_train.sh        # Step 2: Launch RL training
+bash merge_model.sh      # Step 3: Merge FSDP checkpoints
+```
+Training data: [MBZUAI/medix-rl-data](https://huggingface.co/datasets/MBZUAI/medix-rl-data) (~51K train, ~2.5K test samples)
+See [`training/README.md`](training/README.md) for detailed setup, configuration options, and per-model scripts.
+## Evaluation
+We propose a unified evaluation framework for both text-only (LLM) and image+text (VLM) tasks using a Reference-based LLM-as-judge across 17 medical benchmarks.
+```bash
+cd eval
+pip install uv && uv pip install -r requirements.txt
+bash eval.sh             # Run all phases: generate, evaluate, score
+```
+Supports self-hosted judge models via vLLM or [OpenRouter](https://openrouter.ai/) as a remote alternative. Results can be submitted to the [MediX Leaderboard](https://medix.cvmbzuai.com/leaderboard).
+See [`eval/README.md`](eval/README.md) for task selection, CLI reference, and MMMU-Medical evaluation.
+---
+## Model Zoo
+| Model | HuggingFace |
+|-------|-------------|
+| MediX-R1-2B | [MBZUAI/MediX-R1-2B](https://huggingface.co/MBZUAI/MediX-R1-2B) |
+| MediX-R1-8B | [MBZUAI/MediX-R1-8B](https://huggingface.co/MBZUAI/MediX-R1-8B) |
+| MediX-R1-30B | [MBZUAI/MediX-R1-30B](https://huggingface.co/MBZUAI/MediX-R1-30B) |
+---
+## Citation
+If you use MediX-R1 in your research, please cite our work as follows:
+```bibtex
+@misc{mullappilly2026medixr1openendedmedical,
+      title={MediX-R1: Open Ended Medical Reinforcement Learning},
+      author={Sahal Shaji Mullappilly and Mohammed Irfan Kurpath and Omair Mohamed and Mohamed Zidan and Fahad Khan and Salman Khan and Rao Anwer and Hisham Cholakkal},
+      year={2026},
+      eprint={2602.23363},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2602.23363},
+}
+```
+---
+## License
+This project is released for **research purposes only** under [*CC-BY-NC-SA 4.0*](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.en) License. It is not intended for clinical or commercial use.
+Users are urged to employ MediX-R1 responsibly, especially when applying its outputs in real-world medical scenarios. It is imperative to verify the model's advice with qualified healthcare professionals and not rely on it for medical diagnoses or treatment decisions.
+---
+## Acknowledgements
+We are thankful to [EasyR1](https://github.com/hiyouga/EasyR1) (a fork of [veRL](https://github.com/volcengine/verl)) for their open-source RL training framework.
+This work was partially supported with  *NVIDIA Academic Grant 2025* and *MBZUAI-IITD* Research Collaboration Seed Grant.
+We are grateful to [MBZUAI](https://mbzuai.ac.ae/) for compute and support.