Instructions to use MBZUAI/MediX-R1-30B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MBZUAI/MediX-R1-30B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MBZUAI/MediX-R1-30B-GGUF",
	filename="MediX-R1-30B-IQ3_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MBZUAI/MediX-R1-30B-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use MBZUAI/MediX-R1-30B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MBZUAI/MediX-R1-30B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MBZUAI/MediX-R1-30B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Ollama
How to use MBZUAI/MediX-R1-30B-GGUF with Ollama:
```
ollama run hf.co/MBZUAI/MediX-R1-30B-GGUF:Q4_K_M
```

Unsloth Studio

How to use MBZUAI/MediX-R1-30B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MBZUAI/MediX-R1-30B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MBZUAI/MediX-R1-30B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MBZUAI/MediX-R1-30B-GGUF to start chatting

How to use MBZUAI/MediX-R1-30B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MBZUAI/MediX-R1-30B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MBZUAI/MediX-R1-30B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use MBZUAI/MediX-R1-30B-GGUF with Docker Model Runner:
```
docker model run hf.co/MBZUAI/MediX-R1-30B-GGUF:Q4_K_M
```

Lemonade

How to use MBZUAI/MediX-R1-30B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MBZUAI/MediX-R1-30B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.MediX-R1-30B-GGUF-Q4_K_M

List all available models

lemonade list

MediX-R1: Open-Ended Medical Reinforcement Learning

MediX-R1

Sahal Shaji Mullappilly*, Mohammed Irfan K*, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Muhammad Anwer, and Hisham Cholakkal

*Equally contributing first authors

Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE

Overview

MediX-R1 is an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes vision-language backbones with Group-Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward, a medical embedding-based semantic reward, and lightweight format and modality rewards that enforce interpretable reasoning.

Despite using only ~50K instruction examples, MediX-R1 achieves excellent results across standard medical LLM and VLM benchmarks, outperforming strong open-source baselines.

Highlights:

Our 8B model achieves an overall average of 68.8%, outperforming the much larger 27B MedGemma (68.4%).
Our 30B model achieves the best overall score of 73.6%, demonstrating the effectiveness of our composite reward design.

Contributions

We introduce an open-ended RL framework for medical MLLMs that produces clinically grounded, free-form answers beyond MCQ formats.
We design a composite reward combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition, providing stable and informative feedback where traditional verifiable or MCQ-only rewards fall short.
We propose a unified evaluation framework for both text-only and image+text tasks using a Reference-based LLM-as-judge, capturing semantic correctness, reasoning, and contextual alignment.
Despite using only ~50K instruction examples, MediX-R1 achieves state-of-the-art results across diverse medical LLM and VLM benchmarks, with particularly large gains on open-ended clinical tasks.

Architecture

MediX-R1 Architecture

Composite Reward Design

MediX-R1 uses a multi-signal reward combining LLM-based accuracy, embedding-based semantic similarity, format adherence, and modality recognition. This stabilizes training and prevents reward hacking compared to single-signal approaches.

Reward Design

Qualitative Examples

Microscopy Example X-ray Example

Training

We provide training configs for all model sizes using GRPO and DAPO algorithms. The training pipeline uses a vLLM-based reward server for LLM-as-judge scoring during RL training.

cd training
pip install -e .
bash vllm_serve.sh       # Step 1: Start the reward server
bash run_train.sh        # Step 2: Launch RL training
bash merge_model.sh      # Step 3: Merge FSDP checkpoints

Training data: MBZUAI/medix-rl-data (~51K train, ~2.5K test samples)

See training/README.md for detailed setup, configuration options, and per-model scripts.

Evaluation

We propose a unified evaluation framework for both text-only (LLM) and image+text (VLM) tasks using a Reference-based LLM-as-judge across 17 medical benchmarks.

cd eval
pip install uv && uv pip install -r requirements.txt
bash eval.sh             # Run all phases: generate, evaluate, score

Supports self-hosted judge models via vLLM or OpenRouter as a remote alternative. Results can be submitted to the MediX Leaderboard.

See eval/README.md for task selection, CLI reference, and MMMU-Medical evaluation.

Model Zoo

Model	HuggingFace
MediX-R1-2B	MBZUAI/MediX-R1-2B
MediX-R1-8B	MBZUAI/MediX-R1-8B
MediX-R1-30B	MBZUAI/MediX-R1-30B

Citation

If you use MediX-R1 in your research, please cite our work as follows:

@misc{mullappilly2026medixr1openendedmedical,
      title={MediX-R1: Open Ended Medical Reinforcement Learning}, 
      author={Sahal Shaji Mullappilly and Mohammed Irfan Kurpath and Omair Mohamed and Mohamed Zidan and Fahad Khan and Salman Khan and Rao Anwer and Hisham Cholakkal},
      year={2026},
      eprint={2602.23363},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.23363}, 
}

License

This project is released for research purposes only under CC-BY-NC-SA 4.0 License. It is not intended for clinical or commercial use.

Users are urged to employ MediX-R1 responsibly, especially when applying its outputs in real-world medical scenarios. It is imperative to verify the model's advice with qualified healthcare professionals and not rely on it for medical diagnoses or treatment decisions.

Acknowledgements

We are thankful to EasyR1 (a fork of veRL) for their open-source RL training framework.

This work was partially supported with NVIDIA Academic Grant 2025 and MBZUAI-IITD Research Collaboration Seed Grant.

We are grateful to MBZUAI for compute and support.

Downloads last month: 135

GGUF

Model size

31B params

Architecture

qwen3vlmoe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for MBZUAI/MediX-R1-30B-GGUF

Base model

Qwen/Qwen3-VL-30B-A3B-Instruct

Finetuned

MBZUAI/MediX-R1-30B

Quantized

(3)

this model

Collection including MBZUAI/MediX-R1-30B-GGUF

MediX-R1

Collection

Open Ended Medical Reinforcement Learning • 9 items • Updated Mar 25 • 8

Paper for MBZUAI/MediX-R1-30B-GGUF

MediX-R1: Open Ended Medical Reinforcement Learning

Paper • 2602.23363 • Published Feb 26 • 23