Instructions to use AQ-MedAI/PulseMind-72B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AQ-MedAI/PulseMind-72B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="AQ-MedAI/PulseMind-72B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("AQ-MedAI/PulseMind-72B")
model = AutoModelForMultimodalLM.from_pretrained("AQ-MedAI/PulseMind-72B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AQ-MedAI/PulseMind-72B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AQ-MedAI/PulseMind-72B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AQ-MedAI/PulseMind-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/AQ-MedAI/PulseMind-72B

SGLang

How to use AQ-MedAI/PulseMind-72B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AQ-MedAI/PulseMind-72B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AQ-MedAI/PulseMind-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AQ-MedAI/PulseMind-72B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AQ-MedAI/PulseMind-72B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use AQ-MedAI/PulseMind-72B with Docker Model Runner:
```
docker model run hf.co/AQ-MedAI/PulseMind-72B
```

aujcy commited on Jan 30

Commit

49637ff

verified ·

1 Parent(s): 692b7d9

Update README.md

Browse files

Files changed (1) hide show

README.md +421 -54

README.md CHANGED Viewed

@@ -1,81 +1,448 @@
 ---
 license: mit
 ---
-# PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis
-> Official repository for **"PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis"**, accepted as an **Oral** paper at **AAAI 2026**.
 <p align="center">
-  <img src="https://img.shields.io/badge/Conference-AAAI%202026-blue.svg" />
-  <img src="https://img.shields.io/badge/Type-Oral-green.svg" />
-  <img src="https://img.shields.io/badge/Domain-Multi--Modal%20Medicine-orange.svg" />
 </p>
-<p align="center">
-  <b>Datasets, models, and benchmarks for PulseMind.</b>
-</p>
----
-Github repository https://github.com/AQ-MedAI/PulseMind
-## 🌐 Overview
-This repository provides the official **codebase and evaluation scripts** for the PulseMind project, together with:
-- 🧪 **MediScope**: a large-scale multimodal medical dataset.
-  In this release, we provide a curated subset of **~1,000 cases** (JSON + images). The full dataset is larger and will be gradually released.
-- 🧠 **Models**:
-  - `PulseMind-72B`
-- 📊 **Benchmarks**:
-  - `MedDiagnose` – 237-sample test set (JSON + images)
-  - `CMtMedQA-test` – 1,000-sample test set (JSON)
-  - `MedDiagnose-plus` – 937-sample extended test set (JSON + images)
-> ⚠️ Due to size and privacy considerations, **all datasets and model checkpoints are hosted externally** and are **not** stored in this GitHub repository.
-> This repo mainly contains **evaluation code**.
 ---
-### 🔗 Dataset Download Link
-- **MediScope (curated ~1k subset)**
-- **MedDiagnose (237 samples)**
-- **CMtMedQA-test (1,000 samples)**
-- **MedDiagnose-plus (937 samples)**
-  [Download link](https://huggingface.co/datasets/AQ-MedAI/PulseMind)
-### 🧠 Model Checkpoint Links
-- **PulseMind-72B checkpoint**: [Download link](https://huggingface.co/AQ-MedAI/PulseMind-72B/tree/main)
-> After downloading, please follow the recommended directory layout
-> (e.g., place raw data under `data/`, benchmark test sets under `Benchmark/`,
-> and model checkpoints under `model/`), so that the provided evaluation scripts
-> can run out of the box.
 ---
-## 📁 Repository Structure (Code Only)
-The GitHub repository mainly contains evaluation code and auxiliary configs:
-```bash
-.
-├── data/                        # (empty by default) place downloaded datasets here
-│
-├── Benchmark/
-│   ├── CMtMedQA-test/           # Folder for CMtMedQA-test data (JSON, etc.)
-│   ├── MedDiagnose/             # Folder for MedDiagnose data (JSON + images)
-│   ├── MedDiagnose-plus/        # Folder for MedDiagnose-plus data (JSON + images)
-│   └── Eval/                    # Optional: extra evaluation utilities / configs
-│
-├── model/                       # Place downloaded model checkpoints here
-│
-└── README.md
----
-license: mit
----

 ---
 license: mit
+library_name: transformers
+pipeline_tag: image-text-to-text
+tags:
+- medical
+- multimodal
+- clinical-diagnosis
+- clinical-reasoning
+- multi-turn
+- consultation
+- medical-images
+- report-generation
+- reinforcement-learning
+- preference-optimization
 ---
 <p align="center">
+<a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">🤖 PulseMind-72B Model</a>
+&nbsp;&nbsp;
+<a href="https://github.com/AQ-MedAI/PulseMind" target="_blank" rel="noopener">Code & Eval</a>
+&nbsp;&nbsp;
+<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">Technical Report</a>
 </p>
+# *PulseMind-72B* - Multimodal Large Language Model for Real-World Clinical Diagnosis
+# <strong style="color: red">BIG NEWS: <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a> is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks.</strong>
+This repository contains the **PulseMind-72B** model from the paper
+<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener"><i>PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</i></a>.
+---
+## Highlights
+* **Real-world clinical diagnosis focus**: Designed for **multi-turn diagnostic consultation** where models must integrate **medical images + textual clinical context** and maintain evolving patient–physician interaction.
+* **PulseMind Benchmark**: Evaluated using a **multi-turn diagnostic consultation benchmark**.
+* **MediScope dataset**: Trained and studied with **MediScope**, a large-scale multimodal clinical diagnostic dataset consisting of **98,000** real-world multi-turn consultations and **601,500** medical images spanning **10+** major clinical departments and **200+** sub-specialties.
+* **Strong overall performance**: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results).
 ---
+## Release
+- **Technical report**:
+  - <a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</a>
+- **Model weights**:
+  - <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a>
+> **Note on data & checkpoints**:
+> Due to size and privacy considerations, datasets and some checkpoints may be hosted externally.
+> Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts.
 ---
+## Disclaimer
+> **Disclaimer**:
+> Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, **PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content**.
+> It is intended for **research and assistive use** only and must **not** be used as a substitute for professional medical advice, diagnosis, or treatment.
+> Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations.
+> In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.
+## Evaluation
+### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark)
+<p align="center">
+  <img src="PulseMind_show.png" width="800" />
+</p>
+### Medical Multimodal VQA
+<table>
+  <thead>
+    <tr>
+      <th>Models</th>
+      <th>MMMU-Med</th>
+      <th>VQA-RAD</th>
+      <th>PMC-VQA</th>
+      <th>SLAKE</th>
+      <th>PathVQA</th>
+      <th>DermaVQA</th>
+      <th>MedXpertQA</th>
+      <th>Avg.</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
+    </tr>
+    <tr>
+      <td>GPT-4o</td>
+      <td>57.3</td>
+      <td>71.2</td>
+      <td>55.2</td>
+      <td>67.4</td>
+      <td>55.5</td>
+      <td>35.0</td>
+      <td>22.3</td>
+      <td>52.0</td>
+    </tr>
+    <tr>
+      <td>o1</td>
+      <td>57.8</td>
+      <td>63.0</td>
+      <td>54.5</td>
+      <td>69.9</td>
+      <td>57.3</td>
+      <td>43.0</td>
+      <td>49.7</td>
+      <td>56.5</td>
+    </tr>
+    <tr>
+      <td>Gemini-2.5-Pro</td>
+      <td>49.3</td>
+      <td>70.5</td>
+      <td>55.5</td>
+      <td>75.8</td>
+      <td>55.4</td>
+      <td>39.0</td>
+      <td>39.5</td>
+      <td>55.0</td>
+    </tr>
+    <tr>
+      <td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;72B)</strong></td>
+    </tr>
+    <tr>
+      <td>InternVL3-78B</td>
+      <td><u>69.1</u></td>
+      <td>73.6</td>
+      <td>56.6</td>
+      <td>77.4</td>
+      <td><u>51.0</u></td>
+      <td><u>37.0</u></td>
+      <td>27.4</td>
+      <td><u>56.1</u></td>
+    </tr>
+    <tr>
+      <td>Qwen2.5VL-72B</td>
+      <td>66.4</td>
+      <td><u>80.3</u></td>
+      <td><u>59.3</u></td>
+      <td><u>78.3</u></td>
+      <td>42.3</td>
+      <td>34.0</td>
+      <td><u>27.6</u></td>
+      <td>55.5</td>
+    </tr>
+    <tr>
+      <td><strong>PulseMind-72B</strong></td>
+      <td><strong>69.4</strong></td>
+      <td><strong>87.1</strong></td>
+      <td><strong>70.3</strong></td>
+      <td><strong>85.6</strong></td>
+      <td><strong>64.9</strong></td>
+      <td><strong>42.0</strong></td>
+      <td><strong>36.7</strong></td>
+      <td><strong>65.1</strong></td>
+    </tr>
+    <tr>
+      <td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;32B)</strong></td>
+    </tr>
+    <tr>
+      <td>InternVL3-38B</td>
+      <td><strong>65.2</strong></td>
+      <td>65.4</td>
+      <td>56.6</td>
+      <td>72.7</td>
+      <td>51.0</td>
+      <td><u>31.0</u></td>
+      <td>25.2</td>
+      <td>52.4</td>
+    </tr>
+    <tr>
+      <td>Qwen2.5VL-32B</td>
+      <td>62.8</td>
+      <td>73.8</td>
+      <td>54.5</td>
+      <td>71.2</td>
+      <td>41.9</td>
+      <td>25.0</td>
+      <td>25.2</td>
+      <td>50.6</td>
+    </tr>
+    <tr>
+      <td>LLAVA-med-34B</td>
+      <td>48.9</td>
+      <td>58.6</td>
+      <td>44.4</td>
+      <td>67.3</td>
+      <td>48.8</td>
+      <td>13.0</td>
+      <td>16.4</td>
+      <td>42.5</td>
+    </tr>
+    <tr>
+      <td>HuatuoGPT-vision-34B</td>
+      <td>54.3</td>
+      <td>61.4</td>
+      <td>56.6</td>
+      <td>69.5</td>
+      <td>44.4</td>
+      <td>21.0</td>
+      <td>17.3</td>
+      <td>46.4</td>
+    </tr>
+    <tr>
+      <td>Lingshu-32B</td>
+      <td>62.3</td>
+      <td><u>76.5</u></td>
+      <td><u>57.9</u></td>
+      <td><strong>89.2</strong></td>
+      <td><strong>65.9</strong></td>
+      <td>17.0</td>
+      <td><strong>30.9</strong></td>
+      <td><u>57.1</u></td>
+    </tr>
+    <tr>
+      <td><strong>PulseMind-32B</strong></td>
+      <td><u>64.6</u></td>
+      <td><strong>83.2</strong></td>
+      <td><strong>68.1</strong></td>
+      <td><u>81.5</u></td>
+      <td><u>62.0</u></td>
+      <td><strong>32.0</strong></td>
+      <td><u>29.6</u></td>
+      <td><strong>60.1</strong></td>
+    </tr>
+  </tbody>
+</table>
+### Medical Textual QA
+<table>
+  <thead>
+    <tr>
+      <th>Models</th>
+      <th>MMLU-Med</th>
+      <th>MedMCQA</th>
+      <th>MedQA</th>
+      <th>MedXpertQA</th>
+      <th>Avg.</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
+    </tr>
+    <tr>
+      <td>GPT-4o</td>
+      <td>88.7</td>
+      <td>73.5</td>
+      <td>55.7</td>
+      <td>22.5</td>
+      <td>60.1</td>
+    </tr>
+    <tr>
+      <td>o1</td>
+      <td>91.6</td>
+      <td>82.7</td>
+      <td>86.6</td>
+      <td>48.9</td>
+      <td>77.5</td>
+    </tr>
+    <tr>
+      <td>Gemini-2.5-Pro</td>
+      <td>89.8</td>
+      <td>68.6</td>
+      <td>85.6</td>
+      <td>24.3</td>
+      <td>67.1</td>
+    </tr>
+    <tr>
+      <td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;72B)</strong></td>
+    </tr>
+    <tr>
+      <td>InternVL3-78B</td>
+      <td>83.0</td>
+      <td>66.1</td>
+      <td><u>93.3</u></td>
+      <td><u>18.5</u></td>
+      <td>65.2</td>
+    </tr>
+    <tr>
+      <td>Qwen2.5VL-72B</td>
+      <td><u>88.3</u></td>
+      <td><u>67.2</u></td>
+      <td>91.3</td>
+      <td>16.1</td>
+      <td><u>65.7</u></td>
+    </tr>
+    <tr>
+      <td><strong>PulseMind-72B</strong></td>
+      <td><strong>88.7</strong></td>
+      <td><strong>71.3</strong></td>
+      <td><strong>94.8</strong></td>
+      <td><strong>29.8</strong></td>
+      <td><strong>71.2</strong></td>
+    </tr>
+    <tr>
+      <td colspan="9" style="text-align:center;"><strong>Open-source Models (&gt;10B)</strong></td>
+    </tr>
+    <tr>
+      <td>InternVL3-38B</td>
+      <td>82.8</td>
+      <td>64.9</td>
+      <td>73.5</td>
+      <td>16.0</td>
+      <td>59.3</td>
+    </tr>
+    <tr>
+      <td>Qwen2.5VL-32B</td>
+      <td>83.2</td>
+      <td>63.0</td>
+      <td>71.6</td>
+      <td>15.6</td>
+      <td>58.4</td>
+    </tr>
+    <tr>
+      <td>LLAVA-med-34B</td>
+      <td>74.7</td>
+      <td>52.2</td>
+      <td>63.5</td>
+      <td>14.1</td>
+      <td>51.1</td>
+    </tr>
+    <tr>
+      <td>HuatuoGPT-vision-34B</td>
+      <td>80.8</td>
+      <td>63.6</td>
+      <td>57.4</td>
+      <td>16.0</td>
+      <td>54.5</td>
+    </tr>
+    <tr>
+      <td>Lingshu-32B</td>
+      <td><u>84.7</u></td>
+      <td><u>66.1</u></td>
+      <td><u>74.7</u></td>
+      <td><strong>22.7</strong></td>
+      <td><u>62.1</u></td>
+    </tr>
+    <tr>
+      <td><strong>PulseMind-32B</strong></td>
+      <td><strong>85.6</strong></td>
+      <td><strong>66.4</strong></td>
+      <td><strong>92.9</strong></td>
+      <td><u>21.5</u></td>
+      <td><strong>66.6</strong></td>
+    </tr>
+  </tbody>
+</table>
+### Usage
+```python
+from vllm import LLM, SamplingParams
+from transformers import AutoProcessor
+from qwen_vl_utils import process_vision_info
+import PIL.Image as Image
+MODEL_ID = "AQ-MedAI/PulseMind-72B"
+# Load processor
+processor = AutoProcessor.from_pretrained(MODEL_ID)
+# Load vLLM engine
+llm = LLM(
+    model=MODEL_ID,
+    limit_mm_per_prompt={"image": 4},
+    tensor_parallel_size=2,
+    enforce_eager=True,
+    trust_remote_code=True,
+)
+sampling_params = SamplingParams(
+    temperature=0.1,
+    top_k=1,
+    top_p=0.001,
+    repetition_penalty=1.05,
+    max_tokens=2048,
+    stop_token_ids=[],
+)
+# Example input
+image = Image.open("example.png")
+text = "Describe the image and provide relevant clinical observations."
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": text},
+        ],
+    }
+]
+# Build prompt & multimodal inputs
+prompt = processor.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+image_inputs, video_inputs = process_vision_info(messages)
+mm_data = {}
+if image_inputs is not None:
+    mm_data["image"] = image_inputs
+if video_inputs is not None:
+    mm_data["video"] = video_inputs
+outputs = llm.generate(
+    [{"prompt": prompt, "multi_modal_data": mm_data}],
+    sampling_params=sampling_params,
+)
+print(outputs[0].outputs[0].text)
+```
+Evaluation Scripts (Full Paths)
+For complete evaluation pipelines, please refer to:
+<p align="left">
+<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-CMtMedQA.py" target="_blank" rel="noopener">test-CMtMedQA</a>
+&nbsp;&nbsp;
+<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-MedDiagnose.py" target="_blank" rel="noopener">test-MedDiagnose</a>
+&nbsp;&nbsp;
+</p>
+## Citation
+If you find our project useful, we hope you would kindly star our repo and cite our work as follows:
+```
+@article{xu2026pulsemind,
+  title={PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis},
+  author={Xu, Jiao and Liu, Junwei and Lao, Jiangwei and Zhu, Qi and Zhao, Yunpeng and Jin, Congyun and Liu, Shinan and Lu, Zhihong and Zhang, Lihe and Chen, Xin and others},
+  journal={arXiv preprint arXiv:2601.07344},
+  year={2026}
+}
+```