--- license: mit library_name: transformers pipeline_tag: image-text-to-text tags: - medical - multimodal - clinical-diagnosis - clinical-reasoning - multi-turn - consultation - medical-images - report-generation - reinforcement-learning - preference-optimization ---

đŸ¤– PulseMind-72B Model    Code & Eval    Technical Report

# *PulseMind-72B* - Multimodal Large Language Model for Real-World Clinical Diagnosis # BIG NEWS: PulseMind-72B is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks. This repository contains the **PulseMind-72B** model from the paper PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis. --- ## Highlights * **Real-world clinical diagnosis focus**: Designed for **multi-turn diagnostic consultation** where models must integrate **medical images + textual clinical context** and maintain evolving patient–physician interaction. * **PulseMind Benchmark**: Evaluated using a **multi-turn diagnostic consultation benchmark**. * **MediScope dataset**: Trained and studied with **MediScope**, a large-scale multimodal clinical diagnostic dataset consisting of **98,000** real-world multi-turn consultations and **601,500** medical images spanning **10+** major clinical departments and **200+** sub-specialties. * **Strong overall performance**: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results). --- ## Release - **Technical report**: - arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis - **Model weights**: - PulseMind-72B > **Note on data & checkpoints**: > Due to size and privacy considerations, datasets and some checkpoints may be hosted externally. > Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts. --- ## Disclaimer > **Disclaimer**: > Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, **PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content**. > It is intended for **research and assistive use** only and must **not** be used as a substitute for professional medical advice, diagnosis, or treatment. > Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations. > In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos. ## Evaluation ### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark)

### Medical Multimodal VQA
Models MMMU-Med VQA-RAD PMC-VQA SLAKE PathVQA DermaVQA MedXpertQA Avg.
Proprietary Models
GPT-4o 57.3 71.2 55.2 67.4 55.5 35.0 22.3 52.0
o1 57.8 63.0 54.5 69.9 57.3 43.0 49.7 56.5
Gemini-2.5-Pro 49.3 70.5 55.5 75.8 55.4 39.0 39.5 55.0
Open-source Models (≈72B)
InternVL3-78B 69.1 73.6 56.6 77.4 51.0 37.0 27.4 56.1
Qwen2.5VL-72B 66.4 80.3 59.3 78.3 42.3 34.0 27.6 55.5
PulseMind-72B 69.4 87.1 70.3 85.6 64.9 42.0 36.7 65.1
Open-source Models (≈32B)
InternVL3-38B 65.2 65.4 56.6 72.7 51.0 31.0 25.2 52.4
Qwen2.5VL-32B 62.8 73.8 54.5 71.2 41.9 25.0 25.2 50.6
LLAVA-med-34B 48.9 58.6 44.4 67.3 48.8 13.0 16.4 42.5
HuatuoGPT-vision-34B 54.3 61.4 56.6 69.5 44.4 21.0 17.3 46.4
Lingshu-32B 62.3 76.5 57.9 89.2 65.9 17.0 30.9 57.1
PulseMind-32B 64.6 83.2 68.1 81.5 62.0 32.0 29.6 60.1
### Medical Textual QA
Models MMLU-Med MedMCQA MedQA MedXpertQA Avg.
Proprietary Models
GPT-4o 88.7 73.5 55.7 22.5 60.1
o1 91.6 82.7 86.6 48.9 77.5
Gemini-2.5-Pro 89.8 68.6 85.6 24.3 67.1
Open-source Models (≈72B)
InternVL3-78B 83.0 66.1 93.3 18.5 65.2
Qwen2.5VL-72B 88.3 67.2 91.3 16.1 65.7
PulseMind-72B 88.7 71.3 94.8 29.8 71.2
Open-source Models (>10B)
InternVL3-38B 82.8 64.9 73.5 16.0 59.3
Qwen2.5VL-32B 83.2 63.0 71.6 15.6 58.4
LLAVA-med-34B 74.7 52.2 63.5 14.1 51.1
HuatuoGPT-vision-34B 80.8 63.6 57.4 16.0 54.5
Lingshu-32B 84.7 66.1 74.7 22.7 62.1
PulseMind-32B 85.6 66.4 92.9 21.5 66.6
### Usage ```python from vllm import LLM, SamplingParams from transformers import AutoProcessor from qwen_vl_utils import process_vision_info import PIL.Image as Image MODEL_ID = "AQ-MedAI/PulseMind-72B" # Load processor processor = AutoProcessor.from_pretrained(MODEL_ID) # Load vLLM engine llm = LLM( model=MODEL_ID, limit_mm_per_prompt={"image": 4}, tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True, ) sampling_params = SamplingParams( temperature=0.1, top_k=1, top_p=0.001, repetition_penalty=1.05, max_tokens=2048, stop_token_ids=[], ) # Example input image = Image.open("example.png") text = "Describe the image and provide relevant clinical observations." messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": text}, ], } ] # Build prompt & multimodal inputs prompt = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) image_inputs, video_inputs = process_vision_info(messages) mm_data = {} if image_inputs is not None: mm_data["image"] = image_inputs if video_inputs is not None: mm_data["video"] = video_inputs outputs = llm.generate( [{"prompt": prompt, "multi_modal_data": mm_data}], sampling_params=sampling_params, ) print(outputs[0].outputs[0].text) ``` Evaluation Scripts (Full Paths) For complete evaluation pipelines, please refer to:

test-CMtMedQA    test-MedDiagnose   

## Citation If you find our project useful, we hope you would kindly star our repo and cite our work as follows: ``` @article{xu2026pulsemind, title={PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis}, author={Xu, Jiao and Liu, Junwei and Lao, Jiangwei and Zhu, Qi and Zhao, Yunpeng and Jin, Congyun and Liu, Shinan and Lu, Zhihong and Zhang, Lihe and Chen, Xin and others}, journal={arXiv preprint arXiv:2601.07344}, year={2026} } ```