PulseMind-72B / README.md

Update README.md

49637ff verified 8 days ago

12 kB

	---
	license: mit
	library_name: transformers
	pipeline_tag: image-text-to-text
	tags:
	- medical
	- multimodal
	- clinical-diagnosis
	- clinical-reasoning
	- multi-turn
	- consultation
	- medical-images
	- report-generation
	- reinforcement-learning
	- preference-optimization
	---

	<p align="center">
	<a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">🤖 PulseMind-72B Model</a>

	<a href="https://github.com/AQ-MedAI/PulseMind" target="_blank" rel="noopener">Code & Eval</a>

	<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">Technical Report</a>
	</p>

	# PulseMind-72B - Multimodal Large Language Model for Real-World Clinical Diagnosis

	# <strong style="color: red">BIG NEWS: <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a> is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks.</strong>

	This repository contains the PulseMind-72B model from the paper
	<a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener"><i>PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</i></a>.


	---

	## Highlights

	* Real-world clinical diagnosis focus: Designed for multi-turn diagnostic consultation where models must integrate medical images + textual clinical context and maintain evolving patient–physician interaction.
	* PulseMind Benchmark: Evaluated using a multi-turn diagnostic consultation benchmark.
	* MediScope dataset: Trained and studied with MediScope, a large-scale multimodal clinical diagnostic dataset consisting of 98,000 real-world multi-turn consultations and 601,500 medical images spanning 10+ major clinical departments and 200+ sub-specialties.
	* Strong overall performance: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results).

	---

	## Release

	- Technical report:
	- <a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</a>
	- Model weights:
	- <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a>




	> Note on data & checkpoints:
	> Due to size and privacy considerations, datasets and some checkpoints may be hosted externally.
	> Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts.

	---

	## Disclaimer

	> Disclaimer:
	> Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content.
	> It is intended for research and assistive use only and must not be used as a substitute for professional medical advice, diagnosis, or treatment.
	> Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations.
	> In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.

	## Evaluation

	### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark)
	<p align="center">
	<img src="PulseMind_show.png" width="800" />
	</p>

	### Medical Multimodal VQA

	<table>
	<thead>
	<tr>
	<th>Models</th>
	<th>MMMU-Med</th>
	<th>VQA-RAD</th>
	<th>PMC-VQA</th>
	<th>SLAKE</th>
	<th>PathVQA</th>
	<th>DermaVQA</th>
	<th>MedXpertQA</th>
	<th>Avg.</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
	</tr>
	<tr>
	<td>GPT-4o</td>
	<td>57.3</td>
	<td>71.2</td>
	<td>55.2</td>
	<td>67.4</td>
	<td>55.5</td>
	<td>35.0</td>
	<td>22.3</td>
	<td>52.0</td>
	</tr>
	<tr>
	<td>o1</td>
	<td>57.8</td>
	<td>63.0</td>
	<td>54.5</td>
	<td>69.9</td>
	<td>57.3</td>
	<td>43.0</td>
	<td>49.7</td>
	<td>56.5</td>
	</tr>
	<tr>
	<td>Gemini-2.5-Pro</td>
	<td>49.3</td>
	<td>70.5</td>
	<td>55.5</td>
	<td>75.8</td>
	<td>55.4</td>
	<td>39.0</td>
	<td>39.5</td>
	<td>55.0</td>
	</tr>
	<tr>
	<td colspan="9" style="text-align:center;"><strong>Open-source Models (≈72B)</strong></td>
	</tr>
	<tr>
	<td>InternVL3-78B</td>
	<td><u>69.1</u></td>
	<td>73.6</td>
	<td>56.6</td>
	<td>77.4</td>
	<td><u>51.0</u></td>
	<td><u>37.0</u></td>
	<td>27.4</td>
	<td><u>56.1</u></td>
	</tr>
	<tr>
	<td>Qwen2.5VL-72B</td>
	<td>66.4</td>
	<td><u>80.3</u></td>
	<td><u>59.3</u></td>
	<td><u>78.3</u></td>
	<td>42.3</td>
	<td>34.0</td>
	<td><u>27.6</u></td>
	<td>55.5</td>
	</tr>
	<tr>
	<td><strong>PulseMind-72B</strong></td>
	<td><strong>69.4</strong></td>
	<td><strong>87.1</strong></td>
	<td><strong>70.3</strong></td>
	<td><strong>85.6</strong></td>
	<td><strong>64.9</strong></td>
	<td><strong>42.0</strong></td>
	<td><strong>36.7</strong></td>
	<td><strong>65.1</strong></td>
	</tr>
	<tr>
	<td colspan="9" style="text-align:center;"><strong>Open-source Models (≈32B)</strong></td>
	</tr>
	<tr>
	<td>InternVL3-38B</td>
	<td><strong>65.2</strong></td>
	<td>65.4</td>
	<td>56.6</td>
	<td>72.7</td>
	<td>51.0</td>
	<td><u>31.0</u></td>
	<td>25.2</td>
	<td>52.4</td>
	</tr>
	<tr>
	<td>Qwen2.5VL-32B</td>
	<td>62.8</td>
	<td>73.8</td>
	<td>54.5</td>
	<td>71.2</td>
	<td>41.9</td>
	<td>25.0</td>
	<td>25.2</td>
	<td>50.6</td>
	</tr>
	<tr>
	<td>LLAVA-med-34B</td>
	<td>48.9</td>
	<td>58.6</td>
	<td>44.4</td>
	<td>67.3</td>
	<td>48.8</td>
	<td>13.0</td>
	<td>16.4</td>
	<td>42.5</td>
	</tr>
	<tr>
	<td>HuatuoGPT-vision-34B</td>
	<td>54.3</td>
	<td>61.4</td>
	<td>56.6</td>
	<td>69.5</td>
	<td>44.4</td>
	<td>21.0</td>
	<td>17.3</td>
	<td>46.4</td>
	</tr>
	<tr>
	<td>Lingshu-32B</td>
	<td>62.3</td>
	<td><u>76.5</u></td>
	<td><u>57.9</u></td>
	<td><strong>89.2</strong></td>
	<td><strong>65.9</strong></td>
	<td>17.0</td>
	<td><strong>30.9</strong></td>
	<td><u>57.1</u></td>
	</tr>
	<tr>
	<td><strong>PulseMind-32B</strong></td>
	<td><u>64.6</u></td>
	<td><strong>83.2</strong></td>
	<td><strong>68.1</strong></td>
	<td><u>81.5</u></td>
	<td><u>62.0</u></td>
	<td><strong>32.0</strong></td>
	<td><u>29.6</u></td>
	<td><strong>60.1</strong></td>
	</tr>
	</tbody>
	</table>


	### Medical Textual QA

	<table>
	<thead>
	<tr>
	<th>Models</th>
	<th>MMLU-Med</th>
	<th>MedMCQA</th>
	<th>MedQA</th>
	<th>MedXpertQA</th>
	<th>Avg.</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
	</tr>
	<tr>
	<td>GPT-4o</td>
	<td>88.7</td>
	<td>73.5</td>
	<td>55.7</td>
	<td>22.5</td>
	<td>60.1</td>
	</tr>
	<tr>
	<td>o1</td>
	<td>91.6</td>
	<td>82.7</td>
	<td>86.6</td>
	<td>48.9</td>
	<td>77.5</td>
	</tr>
	<tr>
	<td>Gemini-2.5-Pro</td>
	<td>89.8</td>
	<td>68.6</td>
	<td>85.6</td>
	<td>24.3</td>
	<td>67.1</td>
	</tr>
	<tr>
	<td colspan="9" style="text-align:center;"><strong>Open-source Models (≈72B)</strong></td>
	</tr>
	<tr>
	<td>InternVL3-78B</td>
	<td>83.0</td>
	<td>66.1</td>
	<td><u>93.3</u></td>
	<td><u>18.5</u></td>
	<td>65.2</td>
	</tr>
	<tr>
	<td>Qwen2.5VL-72B</td>
	<td><u>88.3</u></td>
	<td><u>67.2</u></td>
	<td>91.3</td>
	<td>16.1</td>
	<td><u>65.7</u></td>
	</tr>
	<tr>
	<td><strong>PulseMind-72B</strong></td>
	<td><strong>88.7</strong></td>
	<td><strong>71.3</strong></td>
	<td><strong>94.8</strong></td>
	<td><strong>29.8</strong></td>
	<td><strong>71.2</strong></td>
	</tr>
	<tr>
	<td colspan="9" style="text-align:center;"><strong>Open-source Models (>10B)</strong></td>
	</tr>
	<tr>
	<td>InternVL3-38B</td>
	<td>82.8</td>
	<td>64.9</td>
	<td>73.5</td>
	<td>16.0</td>
	<td>59.3</td>
	</tr>
	<tr>
	<td>Qwen2.5VL-32B</td>
	<td>83.2</td>
	<td>63.0</td>
	<td>71.6</td>
	<td>15.6</td>
	<td>58.4</td>
	</tr>
	<tr>
	<td>LLAVA-med-34B</td>
	<td>74.7</td>
	<td>52.2</td>
	<td>63.5</td>
	<td>14.1</td>
	<td>51.1</td>
	</tr>
	<tr>
	<td>HuatuoGPT-vision-34B</td>
	<td>80.8</td>
	<td>63.6</td>
	<td>57.4</td>
	<td>16.0</td>
	<td>54.5</td>
	</tr>
	<tr>
	<td>Lingshu-32B</td>
	<td><u>84.7</u></td>
	<td><u>66.1</u></td>
	<td><u>74.7</u></td>
	<td><strong>22.7</strong></td>
	<td><u>62.1</u></td>
	</tr>
	<tr>
	<td><strong>PulseMind-32B</strong></td>
	<td><strong>85.6</strong></td>
	<td><strong>66.4</strong></td>
	<td><strong>92.9</strong></td>
	<td><u>21.5</u></td>
	<td><strong>66.6</strong></td>
	</tr>
	</tbody>
	</table>


	### Usage
	```python
	from vllm import LLM, SamplingParams
	from transformers import AutoProcessor
	from qwen_vl_utils import process_vision_info
	import PIL.Image as Image

	MODEL_ID = "AQ-MedAI/PulseMind-72B"

	# Load processor
	processor = AutoProcessor.from_pretrained(MODEL_ID)

	# Load vLLM engine
	llm = LLM(
	model=MODEL_ID,
	limit_mm_per_prompt={"image": 4},
	tensor_parallel_size=2,
	enforce_eager=True,
	trust_remote_code=True,
	)

	sampling_params = SamplingParams(
	temperature=0.1,
	top_k=1,
	top_p=0.001,
	repetition_penalty=1.05,
	max_tokens=2048,
	stop_token_ids=[],
	)

	# Example input
	image = Image.open("example.png")
	text = "Describe the image and provide relevant clinical observations."

	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image", "image": image},
	{"type": "text", "text": text},
	],
	}
	]

	# Build prompt & multimodal inputs
	prompt = processor.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	image_inputs, video_inputs = process_vision_info(messages)

	mm_data = {}
	if image_inputs is not None:
	mm_data["image"] = image_inputs
	if video_inputs is not None:
	mm_data["video"] = video_inputs

	outputs = llm.generate(
	[{"prompt": prompt, "multi_modal_data": mm_data}],
	sampling_params=sampling_params,
	)

	print(outputs[0].outputs[0].text)
	```
	Evaluation Scripts (Full Paths)
	For complete evaluation pipelines, please refer to:
	<p align="left">
	<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-CMtMedQA.py" target="_blank" rel="noopener">test-CMtMedQA</a>

	<a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-MedDiagnose.py" target="_blank" rel="noopener">test-MedDiagnose</a>

	</p>



	## Citation

	If you find our project useful, we hope you would kindly star our repo and cite our work as follows:

	```
	@article{xu2026pulsemind,
	title={PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis},
	author={Xu, Jiao and Liu, Junwei and Lao, Jiangwei and Zhu, Qi and Zhao, Yunpeng and Jin, Congyun and Liu, Shinan and Lu, Zhihong and Zhang, Lihe and Chen, Xin and others},
	journal={arXiv preprint arXiv:2601.07344},
	year={2026}
	}
	```