PulseMind-72B / README.md
aujcy's picture
Update README.md
49637ff verified
---
license: mit
library_name: transformers
pipeline_tag: image-text-to-text
tags:
- medical
- multimodal
- clinical-diagnosis
- clinical-reasoning
- multi-turn
- consultation
- medical-images
- report-generation
- reinforcement-learning
- preference-optimization
---
<p align="center">
<a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">🤖 PulseMind-72B Model</a>
&nbsp;&nbsp;
<a href="https://github.com/AQ-MedAI/PulseMind" target="_blank" rel="noopener">Code & Eval</a>
&nbsp;&nbsp;
<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">Technical Report</a>
</p>
# *PulseMind-72B* - Multimodal Large Language Model for Real-World Clinical Diagnosis
# <strong style="color: red">BIG NEWS: <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a> is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks.</strong>
This repository contains the **PulseMind-72B** model from the paper
<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener"><i>PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</i></a>.
---
## Highlights
* **Real-world clinical diagnosis focus**: Designed for **multi-turn diagnostic consultation** where models must integrate **medical images + textual clinical context** and maintain evolving patient–physician interaction.
* **PulseMind Benchmark**: Evaluated using a **multi-turn diagnostic consultation benchmark**.
* **MediScope dataset**: Trained and studied with **MediScope**, a large-scale multimodal clinical diagnostic dataset consisting of **98,000** real-world multi-turn consultations and **601,500** medical images spanning **10+** major clinical departments and **200+** sub-specialties.
* **Strong overall performance**: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results).
---
## Release
- **Technical report**:
- <a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</a>
- **Model weights**:
- <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a>
> **Note on data & checkpoints**:
> Due to size and privacy considerations, datasets and some checkpoints may be hosted externally.
> Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts.
---
## Disclaimer
> **Disclaimer**:
> Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, **PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content**.
> It is intended for **research and assistive use** only and must **not** be used as a substitute for professional medical advice, diagnosis, or treatment.
> Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations.
> In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.
## Evaluation
### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark)
<p align="center">
<img src="PulseMind_show.png" width="800" />
</p>
### Medical Multimodal VQA
<table>
<thead>
<tr>
<th>Models</th>
<th>MMMU-Med</th>
<th>VQA-RAD</th>
<th>PMC-VQA</th>
<th>SLAKE</th>
<th>PathVQA</th>
<th>DermaVQA</th>
<th>MedXpertQA</th>
<th>Avg.</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
</tr>
<tr>
<td>GPT-4o</td>
<td>57.3</td>
<td>71.2</td>
<td>55.2</td>
<td>67.4</td>
<td>55.5</td>
<td>35.0</td>
<td>22.3</td>
<td>52.0</td>
</tr>
<tr>
<td>o1</td>
<td>57.8</td>
<td>63.0</td>
<td>54.5</td>
<td>69.9</td>
<td>57.3</td>
<td>43.0</td>
<td>49.7</td>
<td>56.5</td>
</tr>
<tr>
<td>Gemini-2.5-Pro</td>
<td>49.3</td>
<td>70.5</td>
<td>55.5</td>
<td>75.8</td>
<td>55.4</td>
<td>39.0</td>
<td>39.5</td>
<td>55.0</td>
</tr>
<tr>
<td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;72B)</strong></td>
</tr>
<tr>
<td>InternVL3-78B</td>
<td><u>69.1</u></td>
<td>73.6</td>
<td>56.6</td>
<td>77.4</td>
<td><u>51.0</u></td>
<td><u>37.0</u></td>
<td>27.4</td>
<td><u>56.1</u></td>
</tr>
<tr>
<td>Qwen2.5VL-72B</td>
<td>66.4</td>
<td><u>80.3</u></td>
<td><u>59.3</u></td>
<td><u>78.3</u></td>
<td>42.3</td>
<td>34.0</td>
<td><u>27.6</u></td>
<td>55.5</td>
</tr>
<tr>
<td><strong>PulseMind-72B</strong></td>
<td><strong>69.4</strong></td>
<td><strong>87.1</strong></td>
<td><strong>70.3</strong></td>
<td><strong>85.6</strong></td>
<td><strong>64.9</strong></td>
<td><strong>42.0</strong></td>
<td><strong>36.7</strong></td>
<td><strong>65.1</strong></td>
</tr>
<tr>
<td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;32B)</strong></td>
</tr>
<tr>
<td>InternVL3-38B</td>
<td><strong>65.2</strong></td>
<td>65.4</td>
<td>56.6</td>
<td>72.7</td>
<td>51.0</td>
<td><u>31.0</u></td>
<td>25.2</td>
<td>52.4</td>
</tr>
<tr>
<td>Qwen2.5VL-32B</td>
<td>62.8</td>
<td>73.8</td>
<td>54.5</td>
<td>71.2</td>
<td>41.9</td>
<td>25.0</td>
<td>25.2</td>
<td>50.6</td>
</tr>
<tr>
<td>LLAVA-med-34B</td>
<td>48.9</td>
<td>58.6</td>
<td>44.4</td>
<td>67.3</td>
<td>48.8</td>
<td>13.0</td>
<td>16.4</td>
<td>42.5</td>
</tr>
<tr>
<td>HuatuoGPT-vision-34B</td>
<td>54.3</td>
<td>61.4</td>
<td>56.6</td>
<td>69.5</td>
<td>44.4</td>
<td>21.0</td>
<td>17.3</td>
<td>46.4</td>
</tr>
<tr>
<td>Lingshu-32B</td>
<td>62.3</td>
<td><u>76.5</u></td>
<td><u>57.9</u></td>
<td><strong>89.2</strong></td>
<td><strong>65.9</strong></td>
<td>17.0</td>
<td><strong>30.9</strong></td>
<td><u>57.1</u></td>
</tr>
<tr>
<td><strong>PulseMind-32B</strong></td>
<td><u>64.6</u></td>
<td><strong>83.2</strong></td>
<td><strong>68.1</strong></td>
<td><u>81.5</u></td>
<td><u>62.0</u></td>
<td><strong>32.0</strong></td>
<td><u>29.6</u></td>
<td><strong>60.1</strong></td>
</tr>
</tbody>
</table>
### Medical Textual QA
<table>
<thead>
<tr>
<th>Models</th>
<th>MMLU-Med</th>
<th>MedMCQA</th>
<th>MedQA</th>
<th>MedXpertQA</th>
<th>Avg.</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
</tr>
<tr>
<td>GPT-4o</td>
<td>88.7</td>
<td>73.5</td>
<td>55.7</td>
<td>22.5</td>
<td>60.1</td>
</tr>
<tr>
<td>o1</td>
<td>91.6</td>
<td>82.7</td>
<td>86.6</td>
<td>48.9</td>
<td>77.5</td>
</tr>
<tr>
<td>Gemini-2.5-Pro</td>
<td>89.8</td>
<td>68.6</td>
<td>85.6</td>
<td>24.3</td>
<td>67.1</td>
</tr>
<tr>
<td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;72B)</strong></td>
</tr>
<tr>
<td>InternVL3-78B</td>
<td>83.0</td>
<td>66.1</td>
<td><u>93.3</u></td>
<td><u>18.5</u></td>
<td>65.2</td>
</tr>
<tr>
<td>Qwen2.5VL-72B</td>
<td><u>88.3</u></td>
<td><u>67.2</u></td>
<td>91.3</td>
<td>16.1</td>
<td><u>65.7</u></td>
</tr>
<tr>
<td><strong>PulseMind-72B</strong></td>
<td><strong>88.7</strong></td>
<td><strong>71.3</strong></td>
<td><strong>94.8</strong></td>
<td><strong>29.8</strong></td>
<td><strong>71.2</strong></td>
</tr>
<tr>
<td colspan="9" style="text-align:center;"><strong>Open-source Models (&gt;10B)</strong></td>
</tr>
<tr>
<td>InternVL3-38B</td>
<td>82.8</td>
<td>64.9</td>
<td>73.5</td>
<td>16.0</td>
<td>59.3</td>
</tr>
<tr>
<td>Qwen2.5VL-32B</td>
<td>83.2</td>
<td>63.0</td>
<td>71.6</td>
<td>15.6</td>
<td>58.4</td>
</tr>
<tr>
<td>LLAVA-med-34B</td>
<td>74.7</td>
<td>52.2</td>
<td>63.5</td>
<td>14.1</td>
<td>51.1</td>
</tr>
<tr>
<td>HuatuoGPT-vision-34B</td>
<td>80.8</td>
<td>63.6</td>
<td>57.4</td>
<td>16.0</td>
<td>54.5</td>
</tr>
<tr>
<td>Lingshu-32B</td>
<td><u>84.7</u></td>
<td><u>66.1</u></td>
<td><u>74.7</u></td>
<td><strong>22.7</strong></td>
<td><u>62.1</u></td>
</tr>
<tr>
<td><strong>PulseMind-32B</strong></td>
<td><strong>85.6</strong></td>
<td><strong>66.4</strong></td>
<td><strong>92.9</strong></td>
<td><u>21.5</u></td>
<td><strong>66.6</strong></td>
</tr>
</tbody>
</table>
### Usage
```python
from vllm import LLM, SamplingParams
from transformers import AutoProcessor
from qwen_vl_utils import process_vision_info
import PIL.Image as Image
MODEL_ID = "AQ-MedAI/PulseMind-72B"
# Load processor
processor = AutoProcessor.from_pretrained(MODEL_ID)
# Load vLLM engine
llm = LLM(
model=MODEL_ID,
limit_mm_per_prompt={"image": 4},
tensor_parallel_size=2,
enforce_eager=True,
trust_remote_code=True,
)
sampling_params = SamplingParams(
temperature=0.1,
top_k=1,
top_p=0.001,
repetition_penalty=1.05,
max_tokens=2048,
stop_token_ids=[],
)
# Example input
image = Image.open("example.png")
text = "Describe the image and provide relevant clinical observations."
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": text},
],
}
]
# Build prompt & multimodal inputs
prompt = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
image_inputs, video_inputs = process_vision_info(messages)
mm_data = {}
if image_inputs is not None:
mm_data["image"] = image_inputs
if video_inputs is not None:
mm_data["video"] = video_inputs
outputs = llm.generate(
[{"prompt": prompt, "multi_modal_data": mm_data}],
sampling_params=sampling_params,
)
print(outputs[0].outputs[0].text)
```
Evaluation Scripts (Full Paths)
For complete evaluation pipelines, please refer to:
<p align="left">
<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-CMtMedQA.py" target="_blank" rel="noopener">test-CMtMedQA</a>
&nbsp;&nbsp;
<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-MedDiagnose.py" target="_blank" rel="noopener">test-MedDiagnose</a>
&nbsp;&nbsp;
</p>
## Citation
If you find our project useful, we hope you would kindly star our repo and cite our work as follows:
```
@article{xu2026pulsemind,
title={PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis},
author={Xu, Jiao and Liu, Junwei and Lao, Jiangwei and Zhu, Qi and Zhao, Yunpeng and Jin, Congyun and Liu, Shinan and Lu, Zhihong and Zhang, Lihe and Chen, Xin and others},
journal={arXiv preprint arXiv:2601.07344},
year={2026}
}
```