--- license: mit library_name: transformers pipeline_tag: image-text-to-text tags: - medical - multimodal - clinical-diagnosis - clinical-reasoning - multi-turn - consultation - medical-images - report-generation - reinforcement-learning - preference-optimization ---
đŸ¤– PulseMind-72B Model Code & Eval Technical Report
# *PulseMind-72B* - Multimodal Large Language Model for Real-World Clinical Diagnosis # BIG NEWS: PulseMind-72B is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks. This repository contains the **PulseMind-72B** model from the paper PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis. --- ## Highlights * **Real-world clinical diagnosis focus**: Designed for **multi-turn diagnostic consultation** where models must integrate **medical images + textual clinical context** and maintain evolving patient–physician interaction. * **PulseMind Benchmark**: Evaluated using a **multi-turn diagnostic consultation benchmark**. * **MediScope dataset**: Trained and studied with **MediScope**, a large-scale multimodal clinical diagnostic dataset consisting of **98,000** real-world multi-turn consultations and **601,500** medical images spanning **10+** major clinical departments and **200+** sub-specialties. * **Strong overall performance**: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results). --- ## Release - **Technical report**: - arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis - **Model weights**: - PulseMind-72B > **Note on data & checkpoints**: > Due to size and privacy considerations, datasets and some checkpoints may be hosted externally. > Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts. --- ## Disclaimer > **Disclaimer**: > Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, **PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content**. > It is intended for **research and assistive use** only and must **not** be used as a substitute for professional medical advice, diagnosis, or treatment. > Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations. > In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos. ## Evaluation ### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark)
| Models | MMMU-Med | VQA-RAD | PMC-VQA | SLAKE | PathVQA | DermaVQA | MedXpertQA | Avg. |
|---|---|---|---|---|---|---|---|---|
| Proprietary Models | ||||||||
| GPT-4o | 57.3 | 71.2 | 55.2 | 67.4 | 55.5 | 35.0 | 22.3 | 52.0 |
| o1 | 57.8 | 63.0 | 54.5 | 69.9 | 57.3 | 43.0 | 49.7 | 56.5 |
| Gemini-2.5-Pro | 49.3 | 70.5 | 55.5 | 75.8 | 55.4 | 39.0 | 39.5 | 55.0 |
| Open-source Models (≈72B) | ||||||||
| InternVL3-78B | 69.1 | 73.6 | 56.6 | 77.4 | 51.0 | 37.0 | 27.4 | 56.1 |
| Qwen2.5VL-72B | 66.4 | 80.3 | 59.3 | 78.3 | 42.3 | 34.0 | 27.6 | 55.5 |
| PulseMind-72B | 69.4 | 87.1 | 70.3 | 85.6 | 64.9 | 42.0 | 36.7 | 65.1 |
| Open-source Models (≈32B) | ||||||||
| InternVL3-38B | 65.2 | 65.4 | 56.6 | 72.7 | 51.0 | 31.0 | 25.2 | 52.4 |
| Qwen2.5VL-32B | 62.8 | 73.8 | 54.5 | 71.2 | 41.9 | 25.0 | 25.2 | 50.6 |
| LLAVA-med-34B | 48.9 | 58.6 | 44.4 | 67.3 | 48.8 | 13.0 | 16.4 | 42.5 |
| HuatuoGPT-vision-34B | 54.3 | 61.4 | 56.6 | 69.5 | 44.4 | 21.0 | 17.3 | 46.4 |
| Lingshu-32B | 62.3 | 76.5 | 57.9 | 89.2 | 65.9 | 17.0 | 30.9 | 57.1 |
| PulseMind-32B | 64.6 | 83.2 | 68.1 | 81.5 | 62.0 | 32.0 | 29.6 | 60.1 |
| Models | MMLU-Med | MedMCQA | MedQA | MedXpertQA | Avg. | |||
|---|---|---|---|---|---|---|---|---|
| Proprietary Models | ||||||||
| GPT-4o | 88.7 | 73.5 | 55.7 | 22.5 | 60.1 | |||
| o1 | 91.6 | 82.7 | 86.6 | 48.9 | 77.5 | |||
| Gemini-2.5-Pro | 89.8 | 68.6 | 85.6 | 24.3 | 67.1 | |||
| Open-source Models (≈72B) | ||||||||
| InternVL3-78B | 83.0 | 66.1 | 93.3 | 18.5 | 65.2 | |||
| Qwen2.5VL-72B | 88.3 | 67.2 | 91.3 | 16.1 | 65.7 | |||
| PulseMind-72B | 88.7 | 71.3 | 94.8 | 29.8 | 71.2 | |||
| Open-source Models (>10B) | ||||||||
| InternVL3-38B | 82.8 | 64.9 | 73.5 | 16.0 | 59.3 | |||
| Qwen2.5VL-32B | 83.2 | 63.0 | 71.6 | 15.6 | 58.4 | |||
| LLAVA-med-34B | 74.7 | 52.2 | 63.5 | 14.1 | 51.1 | |||
| HuatuoGPT-vision-34B | 80.8 | 63.6 | 57.4 | 16.0 | 54.5 | |||
| Lingshu-32B | 84.7 | 66.1 | 74.7 | 22.7 | 62.1 | |||
| PulseMind-32B | 85.6 | 66.4 | 92.9 | 21.5 | 66.6 | |||