|
|
--- |
|
|
license: mit |
|
|
library_name: transformers |
|
|
pipeline_tag: image-text-to-text |
|
|
tags: |
|
|
- medical |
|
|
- multimodal |
|
|
- clinical-diagnosis |
|
|
- clinical-reasoning |
|
|
- multi-turn |
|
|
- consultation |
|
|
- medical-images |
|
|
- report-generation |
|
|
- reinforcement-learning |
|
|
- preference-optimization |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">🤖 PulseMind-72B Model</a> |
|
|
|
|
|
<a href="https://github.com/AQ-MedAI/PulseMind" target="_blank" rel="noopener">Code & Eval</a> |
|
|
|
|
|
<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">Technical Report</a> |
|
|
</p> |
|
|
|
|
|
# *PulseMind-72B* - Multimodal Large Language Model for Real-World Clinical Diagnosis |
|
|
|
|
|
# <strong style="color: red">BIG NEWS: <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a> is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks.</strong> |
|
|
|
|
|
This repository contains the **PulseMind-72B** model from the paper |
|
|
<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener"><i>PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</i></a>. |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Highlights |
|
|
|
|
|
* **Real-world clinical diagnosis focus**: Designed for **multi-turn diagnostic consultation** where models must integrate **medical images + textual clinical context** and maintain evolving patient–physician interaction. |
|
|
* **PulseMind Benchmark**: Evaluated using a **multi-turn diagnostic consultation benchmark**. |
|
|
* **MediScope dataset**: Trained and studied with **MediScope**, a large-scale multimodal clinical diagnostic dataset consisting of **98,000** real-world multi-turn consultations and **601,500** medical images spanning **10+** major clinical departments and **200+** sub-specialties. |
|
|
* **Strong overall performance**: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results). |
|
|
|
|
|
--- |
|
|
|
|
|
## Release |
|
|
|
|
|
- **Technical report**: |
|
|
- <a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</a> |
|
|
- **Model weights**: |
|
|
- <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
> **Note on data & checkpoints**: |
|
|
> Due to size and privacy considerations, datasets and some checkpoints may be hosted externally. |
|
|
> Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts. |
|
|
|
|
|
--- |
|
|
|
|
|
## Disclaimer |
|
|
|
|
|
> **Disclaimer**: |
|
|
> Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, **PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content**. |
|
|
> It is intended for **research and assistive use** only and must **not** be used as a substitute for professional medical advice, diagnosis, or treatment. |
|
|
> Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations. |
|
|
> In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark) |
|
|
<p align="center"> |
|
|
<img src="PulseMind_show.png" width="800" /> |
|
|
</p> |
|
|
|
|
|
### Medical Multimodal VQA |
|
|
|
|
|
<table> |
|
|
<thead> |
|
|
<tr> |
|
|
<th>Models</th> |
|
|
<th>MMMU-Med</th> |
|
|
<th>VQA-RAD</th> |
|
|
<th>PMC-VQA</th> |
|
|
<th>SLAKE</th> |
|
|
<th>PathVQA</th> |
|
|
<th>DermaVQA</th> |
|
|
<th>MedXpertQA</th> |
|
|
<th>Avg.</th> |
|
|
</tr> |
|
|
</thead> |
|
|
<tbody> |
|
|
<tr> |
|
|
<td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>GPT-4o</td> |
|
|
<td>57.3</td> |
|
|
<td>71.2</td> |
|
|
<td>55.2</td> |
|
|
<td>67.4</td> |
|
|
<td>55.5</td> |
|
|
<td>35.0</td> |
|
|
<td>22.3</td> |
|
|
<td>52.0</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>o1</td> |
|
|
<td>57.8</td> |
|
|
<td>63.0</td> |
|
|
<td>54.5</td> |
|
|
<td>69.9</td> |
|
|
<td>57.3</td> |
|
|
<td>43.0</td> |
|
|
<td>49.7</td> |
|
|
<td>56.5</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Gemini-2.5-Pro</td> |
|
|
<td>49.3</td> |
|
|
<td>70.5</td> |
|
|
<td>55.5</td> |
|
|
<td>75.8</td> |
|
|
<td>55.4</td> |
|
|
<td>39.0</td> |
|
|
<td>39.5</td> |
|
|
<td>55.0</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td colspan="9" style="text-align:center;"><strong>Open-source Models (≈72B)</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>InternVL3-78B</td> |
|
|
<td><u>69.1</u></td> |
|
|
<td>73.6</td> |
|
|
<td>56.6</td> |
|
|
<td>77.4</td> |
|
|
<td><u>51.0</u></td> |
|
|
<td><u>37.0</u></td> |
|
|
<td>27.4</td> |
|
|
<td><u>56.1</u></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Qwen2.5VL-72B</td> |
|
|
<td>66.4</td> |
|
|
<td><u>80.3</u></td> |
|
|
<td><u>59.3</u></td> |
|
|
<td><u>78.3</u></td> |
|
|
<td>42.3</td> |
|
|
<td>34.0</td> |
|
|
<td><u>27.6</u></td> |
|
|
<td>55.5</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><strong>PulseMind-72B</strong></td> |
|
|
<td><strong>69.4</strong></td> |
|
|
<td><strong>87.1</strong></td> |
|
|
<td><strong>70.3</strong></td> |
|
|
<td><strong>85.6</strong></td> |
|
|
<td><strong>64.9</strong></td> |
|
|
<td><strong>42.0</strong></td> |
|
|
<td><strong>36.7</strong></td> |
|
|
<td><strong>65.1</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td colspan="9" style="text-align:center;"><strong>Open-source Models (≈32B)</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>InternVL3-38B</td> |
|
|
<td><strong>65.2</strong></td> |
|
|
<td>65.4</td> |
|
|
<td>56.6</td> |
|
|
<td>72.7</td> |
|
|
<td>51.0</td> |
|
|
<td><u>31.0</u></td> |
|
|
<td>25.2</td> |
|
|
<td>52.4</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Qwen2.5VL-32B</td> |
|
|
<td>62.8</td> |
|
|
<td>73.8</td> |
|
|
<td>54.5</td> |
|
|
<td>71.2</td> |
|
|
<td>41.9</td> |
|
|
<td>25.0</td> |
|
|
<td>25.2</td> |
|
|
<td>50.6</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>LLAVA-med-34B</td> |
|
|
<td>48.9</td> |
|
|
<td>58.6</td> |
|
|
<td>44.4</td> |
|
|
<td>67.3</td> |
|
|
<td>48.8</td> |
|
|
<td>13.0</td> |
|
|
<td>16.4</td> |
|
|
<td>42.5</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>HuatuoGPT-vision-34B</td> |
|
|
<td>54.3</td> |
|
|
<td>61.4</td> |
|
|
<td>56.6</td> |
|
|
<td>69.5</td> |
|
|
<td>44.4</td> |
|
|
<td>21.0</td> |
|
|
<td>17.3</td> |
|
|
<td>46.4</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Lingshu-32B</td> |
|
|
<td>62.3</td> |
|
|
<td><u>76.5</u></td> |
|
|
<td><u>57.9</u></td> |
|
|
<td><strong>89.2</strong></td> |
|
|
<td><strong>65.9</strong></td> |
|
|
<td>17.0</td> |
|
|
<td><strong>30.9</strong></td> |
|
|
<td><u>57.1</u></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><strong>PulseMind-32B</strong></td> |
|
|
<td><u>64.6</u></td> |
|
|
<td><strong>83.2</strong></td> |
|
|
<td><strong>68.1</strong></td> |
|
|
<td><u>81.5</u></td> |
|
|
<td><u>62.0</u></td> |
|
|
<td><strong>32.0</strong></td> |
|
|
<td><u>29.6</u></td> |
|
|
<td><strong>60.1</strong></td> |
|
|
</tr> |
|
|
</tbody> |
|
|
</table> |
|
|
|
|
|
|
|
|
### Medical Textual QA |
|
|
|
|
|
<table> |
|
|
<thead> |
|
|
<tr> |
|
|
<th>Models</th> |
|
|
<th>MMLU-Med</th> |
|
|
<th>MedMCQA</th> |
|
|
<th>MedQA</th> |
|
|
<th>MedXpertQA</th> |
|
|
<th>Avg.</th> |
|
|
</tr> |
|
|
</thead> |
|
|
<tbody> |
|
|
<tr> |
|
|
<td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>GPT-4o</td> |
|
|
<td>88.7</td> |
|
|
<td>73.5</td> |
|
|
<td>55.7</td> |
|
|
<td>22.5</td> |
|
|
<td>60.1</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>o1</td> |
|
|
<td>91.6</td> |
|
|
<td>82.7</td> |
|
|
<td>86.6</td> |
|
|
<td>48.9</td> |
|
|
<td>77.5</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Gemini-2.5-Pro</td> |
|
|
<td>89.8</td> |
|
|
<td>68.6</td> |
|
|
<td>85.6</td> |
|
|
<td>24.3</td> |
|
|
<td>67.1</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td colspan="9" style="text-align:center;"><strong>Open-source Models (≈72B)</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>InternVL3-78B</td> |
|
|
<td>83.0</td> |
|
|
<td>66.1</td> |
|
|
<td><u>93.3</u></td> |
|
|
<td><u>18.5</u></td> |
|
|
<td>65.2</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Qwen2.5VL-72B</td> |
|
|
<td><u>88.3</u></td> |
|
|
<td><u>67.2</u></td> |
|
|
<td>91.3</td> |
|
|
<td>16.1</td> |
|
|
<td><u>65.7</u></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><strong>PulseMind-72B</strong></td> |
|
|
<td><strong>88.7</strong></td> |
|
|
<td><strong>71.3</strong></td> |
|
|
<td><strong>94.8</strong></td> |
|
|
<td><strong>29.8</strong></td> |
|
|
<td><strong>71.2</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td colspan="9" style="text-align:center;"><strong>Open-source Models (>10B)</strong></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>InternVL3-38B</td> |
|
|
<td>82.8</td> |
|
|
<td>64.9</td> |
|
|
<td>73.5</td> |
|
|
<td>16.0</td> |
|
|
<td>59.3</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Qwen2.5VL-32B</td> |
|
|
<td>83.2</td> |
|
|
<td>63.0</td> |
|
|
<td>71.6</td> |
|
|
<td>15.6</td> |
|
|
<td>58.4</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>LLAVA-med-34B</td> |
|
|
<td>74.7</td> |
|
|
<td>52.2</td> |
|
|
<td>63.5</td> |
|
|
<td>14.1</td> |
|
|
<td>51.1</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>HuatuoGPT-vision-34B</td> |
|
|
<td>80.8</td> |
|
|
<td>63.6</td> |
|
|
<td>57.4</td> |
|
|
<td>16.0</td> |
|
|
<td>54.5</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td>Lingshu-32B</td> |
|
|
<td><u>84.7</u></td> |
|
|
<td><u>66.1</u></td> |
|
|
<td><u>74.7</u></td> |
|
|
<td><strong>22.7</strong></td> |
|
|
<td><u>62.1</u></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td><strong>PulseMind-32B</strong></td> |
|
|
<td><strong>85.6</strong></td> |
|
|
<td><strong>66.4</strong></td> |
|
|
<td><strong>92.9</strong></td> |
|
|
<td><u>21.5</u></td> |
|
|
<td><strong>66.6</strong></td> |
|
|
</tr> |
|
|
</tbody> |
|
|
</table> |
|
|
|
|
|
|
|
|
### Usage |
|
|
```python |
|
|
from vllm import LLM, SamplingParams |
|
|
from transformers import AutoProcessor |
|
|
from qwen_vl_utils import process_vision_info |
|
|
import PIL.Image as Image |
|
|
|
|
|
MODEL_ID = "AQ-MedAI/PulseMind-72B" |
|
|
|
|
|
# Load processor |
|
|
processor = AutoProcessor.from_pretrained(MODEL_ID) |
|
|
|
|
|
# Load vLLM engine |
|
|
llm = LLM( |
|
|
model=MODEL_ID, |
|
|
limit_mm_per_prompt={"image": 4}, |
|
|
tensor_parallel_size=2, |
|
|
enforce_eager=True, |
|
|
trust_remote_code=True, |
|
|
) |
|
|
|
|
|
sampling_params = SamplingParams( |
|
|
temperature=0.1, |
|
|
top_k=1, |
|
|
top_p=0.001, |
|
|
repetition_penalty=1.05, |
|
|
max_tokens=2048, |
|
|
stop_token_ids=[], |
|
|
) |
|
|
|
|
|
# Example input |
|
|
image = Image.open("example.png") |
|
|
text = "Describe the image and provide relevant clinical observations." |
|
|
|
|
|
messages = [ |
|
|
{ |
|
|
"role": "user", |
|
|
"content": [ |
|
|
{"type": "image", "image": image}, |
|
|
{"type": "text", "text": text}, |
|
|
], |
|
|
} |
|
|
] |
|
|
|
|
|
# Build prompt & multimodal inputs |
|
|
prompt = processor.apply_chat_template( |
|
|
messages, |
|
|
tokenize=False, |
|
|
add_generation_prompt=True, |
|
|
) |
|
|
image_inputs, video_inputs = process_vision_info(messages) |
|
|
|
|
|
mm_data = {} |
|
|
if image_inputs is not None: |
|
|
mm_data["image"] = image_inputs |
|
|
if video_inputs is not None: |
|
|
mm_data["video"] = video_inputs |
|
|
|
|
|
outputs = llm.generate( |
|
|
[{"prompt": prompt, "multi_modal_data": mm_data}], |
|
|
sampling_params=sampling_params, |
|
|
) |
|
|
|
|
|
print(outputs[0].outputs[0].text) |
|
|
``` |
|
|
Evaluation Scripts (Full Paths) |
|
|
For complete evaluation pipelines, please refer to: |
|
|
<p align="left"> |
|
|
<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-CMtMedQA.py" target="_blank" rel="noopener">test-CMtMedQA</a> |
|
|
|
|
|
<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-MedDiagnose.py" target="_blank" rel="noopener">test-MedDiagnose</a> |
|
|
|
|
|
</p> |
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find our project useful, we hope you would kindly star our repo and cite our work as follows: |
|
|
|
|
|
``` |
|
|
@article{xu2026pulsemind, |
|
|
title={PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis}, |
|
|
author={Xu, Jiao and Liu, Junwei and Lao, Jiangwei and Zhu, Qi and Zhao, Yunpeng and Jin, Congyun and Liu, Shinan and Lu, Zhihong and Zhang, Lihe and Chen, Xin and others}, |
|
|
journal={arXiv preprint arXiv:2601.07344}, |
|
|
year={2026} |
|
|
} |
|
|
``` |