--- license: mit library_name: transformers pipeline_tag: image-text-to-text tags: - medical - multimodal - clinical-diagnosis - clinical-reasoning - multi-turn - consultation - medical-images - report-generation - reinforcement-learning - preference-optimization ---

🤖 PulseMind-72B Model Code & Eval Technical Report

# *PulseMind-72B* - Multimodal Large Language Model for Real-World Clinical Diagnosis # BIG NEWS: PulseMind-72B is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks. This repository contains the **PulseMind-72B** model from the paper PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis. --- ## Highlights * **Real-world clinical diagnosis focus**: Designed for **multi-turn diagnostic consultation** where models must integrate **medical images + textual clinical context** and maintain evolving patient–physician interaction. * **PulseMind Benchmark**: Evaluated using a **multi-turn diagnostic consultation benchmark**. * **MediScope dataset**: Trained and studied with **MediScope**, a large-scale multimodal clinical diagnostic dataset consisting of **98,000** real-world multi-turn consultations and **601,500** medical images spanning **10+** major clinical departments and **200+** sub-specialties. * **Strong overall performance**: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results). --- ## Release - **Technical report**: - arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis - **Model weights**: - PulseMind-72B > **Note on data & checkpoints**: > Due to size and privacy considerations, datasets and some checkpoints may be hosted externally. > Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts. --- ## Disclaimer > **Disclaimer**: > Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, **PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content**. > It is intended for **research and assistive use** only and must **not** be used as a substitute for professional medical advice, diagnosis, or treatment. > Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations. > In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos. ## Evaluation ### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark)

### Medical Multimodal VQA

Models	MMMU-Med	VQA-RAD	PMC-VQA	SLAKE	PathVQA	DermaVQA	MedXpertQA	Avg.
Proprietary Models
GPT-4o	57.3	71.2	55.2	67.4	55.5	35.0	22.3	52.0
o1	57.8	63.0	54.5	69.9	57.3	43.0	49.7	56.5
Gemini-2.5-Pro	49.3	70.5	55.5	75.8	55.4	39.0	39.5	55.0
Open-source Models (≈72B)
InternVL3-78B	69.1	73.6	56.6	77.4	51.0	37.0	27.4	56.1
Qwen2.5VL-72B	66.4	80.3	59.3	78.3	42.3	34.0	27.6	55.5
PulseMind-72B	69.4	87.1	70.3	85.6	64.9	42.0	36.7	65.1
Open-source Models (≈32B)
InternVL3-38B	65.2	65.4	56.6	72.7	51.0	31.0	25.2	52.4
Qwen2.5VL-32B	62.8	73.8	54.5	71.2	41.9	25.0	25.2	50.6
LLAVA-med-34B	48.9	58.6	44.4	67.3	48.8	13.0	16.4	42.5
HuatuoGPT-vision-34B	54.3	61.4	56.6	69.5	44.4	21.0	17.3	46.4
Lingshu-32B	62.3	76.5	57.9	89.2	65.9	17.0	30.9	57.1
PulseMind-32B	64.6	83.2	68.1	81.5	62.0	32.0	29.6	60.1

### Medical Textual QA

Models	MMLU-Med	MedMCQA	MedQA	MedXpertQA	Avg.
Proprietary Models
GPT-4o	88.7	73.5	55.7	22.5	60.1
o1	91.6	82.7	86.6	48.9	77.5
Gemini-2.5-Pro	89.8	68.6	85.6	24.3	67.1
Open-source Models (≈72B)
InternVL3-78B	83.0	66.1	93.3	18.5	65.2
Qwen2.5VL-72B	88.3	67.2	91.3	16.1	65.7
PulseMind-72B	88.7	71.3	94.8	29.8	71.2
Open-source Models (>10B)
InternVL3-38B	82.8	64.9	73.5	16.0	59.3
Qwen2.5VL-32B	83.2	63.0	71.6	15.6	58.4
LLAVA-med-34B	74.7	52.2	63.5	14.1	51.1
HuatuoGPT-vision-34B	80.8	63.6	57.4	16.0	54.5
Lingshu-32B	84.7	66.1	74.7	22.7	62.1
PulseMind-32B	85.6	66.4	92.9	21.5	66.6

### Usage ```python from vllm import LLM, SamplingParams from transformers import AutoProcessor from qwen_vl_utils import process_vision_info import PIL.Image as Image MODEL_ID = "AQ-MedAI/PulseMind-72B" # Load processor processor = AutoProcessor.from_pretrained(MODEL_ID) # Load vLLM engine llm = LLM( model=MODEL_ID, limit_mm_per_prompt={"image": 4}, tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True, ) sampling_params = SamplingParams( temperature=0.1, top_k=1, top_p=0.001, repetition_penalty=1.05, max_tokens=2048, stop_token_ids=[], ) # Example input image = Image.open("example.png") text = "Describe the image and provide relevant clinical observations." messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": text}, ], } ] # Build prompt & multimodal inputs prompt = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) image_inputs, video_inputs = process_vision_info(messages) mm_data = {} if image_inputs is not None: mm_data["image"] = image_inputs if video_inputs is not None: mm_data["video"] = video_inputs outputs = llm.generate( [{"prompt": prompt, "multi_modal_data": mm_data}], sampling_params=sampling_params, ) print(outputs[0].outputs[0].text) ``` Evaluation Scripts (Full Paths) For complete evaluation pipelines, please refer to:

test-CMtMedQA test-MedDiagnose

## Citation If you find our project useful, we hope you would kindly star our repo and cite our work as follows: ``` @article{xu2026pulsemind, title={PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis}, author={Xu, Jiao and Liu, Junwei and Lao, Jiangwei and Zhu, Qi and Zhao, Yunpeng and Jin, Congyun and Liu, Shinan and Lu, Zhihong and Zhang, Lihe and Chen, Xin and others}, journal={arXiv preprint arXiv:2601.07344}, year={2026} } ```