aujcy commited on
Commit
49637ff
·
verified ·
1 Parent(s): 692b7d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +421 -54
README.md CHANGED
@@ -1,81 +1,448 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
- # PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis
6
-
7
- > Official repository for **"PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis"**, accepted as an **Oral** paper at **AAAI 2026**.
8
-
9
  <p align="center">
10
- <img src="https://img.shields.io/badge/Conference-AAAI%202026-blue.svg" />
11
- <img src="https://img.shields.io/badge/Type-Oral-green.svg" />
12
- <img src="https://img.shields.io/badge/Domain-Multi--Modal%20Medicine-orange.svg" />
 
 
13
  </p>
14
 
15
- <p align="center">
16
- <b>Datasets, models, and benchmarks for PulseMind.</b>
17
- </p>
18
 
19
- ---
20
- Github repository https://github.com/AQ-MedAI/PulseMind
21
 
22
- ## 🌐 Overview
 
23
 
24
- This repository provides the official **codebase and evaluation scripts** for the PulseMind project, together with:
25
 
26
- - 🧪 **MediScope**: a large-scale multimodal medical dataset.
27
- In this release, we provide a curated subset of **~1,000 cases** (JSON + images). The full dataset is larger and will be gradually released.
28
- - 🧠 **Models**:
29
- - `PulseMind-72B`
30
- - 📊 **Benchmarks**:
31
- - `MedDiagnose` – 237-sample test set (JSON + images)
32
- - `CMtMedQA-test` – 1,000-sample test set (JSON)
33
- - `MedDiagnose-plus` – 937-sample extended test set (JSON + images)
34
 
35
- > ⚠️ Due to size and privacy considerations, **all datasets and model checkpoints are hosted externally** and are **not** stored in this GitHub repository.
36
- > This repo mainly contains **evaluation code**.
 
 
37
 
38
  ---
39
 
40
- ### 🔗 Dataset Download Link
41
 
42
- - **MediScope (curated ~1k subset)**
43
- - **MedDiagnose (237 samples)**
44
- - **CMtMedQA-test (1,000 samples)**
45
- - **MedDiagnose-plus (937 samples)**
46
 
47
- [Download link](https://huggingface.co/datasets/AQ-MedAI/PulseMind)
48
 
49
- ### 🧠 Model Checkpoint Links
50
 
51
- - **PulseMind-72B checkpoint**: [Download link](https://huggingface.co/AQ-MedAI/PulseMind-72B/tree/main)
52
 
53
- > After downloading, please follow the recommended directory layout
54
- > (e.g., place raw data under `data/`, benchmark test sets under `Benchmark/`,
55
- > and model checkpoints under `model/`), so that the provided evaluation scripts
56
- > can run out of the box.
57
 
58
  ---
59
 
60
- ## 📁 Repository Structure (Code Only)
61
 
62
- The GitHub repository mainly contains evaluation code and auxiliary configs:
 
 
 
 
63
 
64
- ```bash
65
- .
66
- ├── data/ # (empty by default) place downloaded datasets here
67
-
68
- ├── Benchmark/
69
- │ ├── CMtMedQA-test/ # Folder for CMtMedQA-test data (JSON, etc.)
70
- │ ├── MedDiagnose/ # Folder for MedDiagnose data (JSON + images)
71
- │ ├── MedDiagnose-plus/ # Folder for MedDiagnose-plus data (JSON + images)
72
- │ └── Eval/ # Optional: extra evaluation utilities / configs
73
-
74
- ├── model/ # Place downloaded model checkpoints here
75
-
76
- └── README.md
77
 
 
 
 
 
78
 
79
- ---
80
- license: mit
81
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
+ tags:
6
+ - medical
7
+ - multimodal
8
+ - clinical-diagnosis
9
+ - clinical-reasoning
10
+ - multi-turn
11
+ - consultation
12
+ - medical-images
13
+ - report-generation
14
+ - reinforcement-learning
15
+ - preference-optimization
16
  ---
17
 
 
 
 
 
18
  <p align="center">
19
+ <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">🤖 PulseMind-72B Model</a>
20
+ &nbsp;&nbsp;
21
+ <a href="https://github.com/AQ-MedAI/PulseMind" target="_blank" rel="noopener">Code & Eval</a>
22
+ &nbsp;&nbsp;
23
+ <a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">Technical Report</a>
24
  </p>
25
 
26
+ # *PulseMind-72B* - Multimodal Large Language Model for Real-World Clinical Diagnosis
 
 
27
 
28
+ # <strong style="color: red">BIG NEWS: <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a> is released for real-world multi-turn clinical diagnosis with state-of-the-art performance on diagnostic consultation benchmarks.</strong>
 
29
 
30
+ This repository contains the **PulseMind-72B** model from the paper
31
+ <a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener"><i>PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</i></a>.
32
 
 
33
 
34
+ ---
35
+
36
+ ## Highlights
 
 
 
 
 
37
 
38
+ * **Real-world clinical diagnosis focus**: Designed for **multi-turn diagnostic consultation** where models must integrate **medical images + textual clinical context** and maintain evolving patient–physician interaction.
39
+ * **PulseMind Benchmark**: Evaluated using a **multi-turn diagnostic consultation benchmark**.
40
+ * **MediScope dataset**: Trained and studied with **MediScope**, a large-scale multimodal clinical diagnostic dataset consisting of **98,000** real-world multi-turn consultations and **601,500** medical images spanning **10+** major clinical departments and **200+** sub-specialties.
41
+ * **Strong overall performance**: Demonstrates competitive results on both the diagnostic consultation benchmark and public medical benchmarks (see paper for full results).
42
 
43
  ---
44
 
45
+ ## Release
46
 
47
+ - **Technical report**:
48
+ - <a href="https://arxiv.org/abs/2601.07344" target="_blank" rel="noopener">arXiv: PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis</a>
49
+ - **Model weights**:
50
+ - <a href="https://huggingface.co/AQ-MedAI/PulseMind-72B" target="_blank" rel="noopener">PulseMind-72B</a>
51
 
 
52
 
 
53
 
 
54
 
55
+ > **Note on data & checkpoints**:
56
+ > Due to size and privacy considerations, datasets and some checkpoints may be hosted externally.
57
+ > Please refer to the HuggingFace model card / GitHub repository for official download instructions and evaluation scripts.
 
58
 
59
  ---
60
 
61
+ ## Disclaimer
62
 
63
+ > **Disclaimer**:
64
+ > Even though the weights, codes, and demos are released openly (similar to other pre-trained models), and despite best efforts in safety evaluation and alignment, **PulseMind-72B may generate inaccurate, misleading, or potentially harmful medical content**.
65
+ > It is intended for **research and assistive use** only and must **not** be used as a substitute for professional medical advice, diagnosis, or treatment.
66
+ > Developers and stakeholders should conduct their own red-teaming, deploy appropriate safeguards, and comply with all applicable laws and regulations.
67
+ > In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.
68
 
69
+ ## Evaluation
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
+ ### Clinical Consultation Dialogue Benchmark(PulseMind Benchmark)
72
+ <p align="center">
73
+ <img src="PulseMind_show.png" width="800" />
74
+ </p>
75
 
76
+ ### Medical Multimodal VQA
77
+
78
+ <table>
79
+ <thead>
80
+ <tr>
81
+ <th>Models</th>
82
+ <th>MMMU-Med</th>
83
+ <th>VQA-RAD</th>
84
+ <th>PMC-VQA</th>
85
+ <th>SLAKE</th>
86
+ <th>PathVQA</th>
87
+ <th>DermaVQA</th>
88
+ <th>MedXpertQA</th>
89
+ <th>Avg.</th>
90
+ </tr>
91
+ </thead>
92
+ <tbody>
93
+ <tr>
94
+ <td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
95
+ </tr>
96
+ <tr>
97
+ <td>GPT-4o</td>
98
+ <td>57.3</td>
99
+ <td>71.2</td>
100
+ <td>55.2</td>
101
+ <td>67.4</td>
102
+ <td>55.5</td>
103
+ <td>35.0</td>
104
+ <td>22.3</td>
105
+ <td>52.0</td>
106
+ </tr>
107
+ <tr>
108
+ <td>o1</td>
109
+ <td>57.8</td>
110
+ <td>63.0</td>
111
+ <td>54.5</td>
112
+ <td>69.9</td>
113
+ <td>57.3</td>
114
+ <td>43.0</td>
115
+ <td>49.7</td>
116
+ <td>56.5</td>
117
+ </tr>
118
+ <tr>
119
+ <td>Gemini-2.5-Pro</td>
120
+ <td>49.3</td>
121
+ <td>70.5</td>
122
+ <td>55.5</td>
123
+ <td>75.8</td>
124
+ <td>55.4</td>
125
+ <td>39.0</td>
126
+ <td>39.5</td>
127
+ <td>55.0</td>
128
+ </tr>
129
+ <tr>
130
+ <td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;72B)</strong></td>
131
+ </tr>
132
+ <tr>
133
+ <td>InternVL3-78B</td>
134
+ <td><u>69.1</u></td>
135
+ <td>73.6</td>
136
+ <td>56.6</td>
137
+ <td>77.4</td>
138
+ <td><u>51.0</u></td>
139
+ <td><u>37.0</u></td>
140
+ <td>27.4</td>
141
+ <td><u>56.1</u></td>
142
+ </tr>
143
+ <tr>
144
+ <td>Qwen2.5VL-72B</td>
145
+ <td>66.4</td>
146
+ <td><u>80.3</u></td>
147
+ <td><u>59.3</u></td>
148
+ <td><u>78.3</u></td>
149
+ <td>42.3</td>
150
+ <td>34.0</td>
151
+ <td><u>27.6</u></td>
152
+ <td>55.5</td>
153
+ </tr>
154
+ <tr>
155
+ <td><strong>PulseMind-72B</strong></td>
156
+ <td><strong>69.4</strong></td>
157
+ <td><strong>87.1</strong></td>
158
+ <td><strong>70.3</strong></td>
159
+ <td><strong>85.6</strong></td>
160
+ <td><strong>64.9</strong></td>
161
+ <td><strong>42.0</strong></td>
162
+ <td><strong>36.7</strong></td>
163
+ <td><strong>65.1</strong></td>
164
+ </tr>
165
+ <tr>
166
+ <td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;32B)</strong></td>
167
+ </tr>
168
+ <tr>
169
+ <td>InternVL3-38B</td>
170
+ <td><strong>65.2</strong></td>
171
+ <td>65.4</td>
172
+ <td>56.6</td>
173
+ <td>72.7</td>
174
+ <td>51.0</td>
175
+ <td><u>31.0</u></td>
176
+ <td>25.2</td>
177
+ <td>52.4</td>
178
+ </tr>
179
+ <tr>
180
+ <td>Qwen2.5VL-32B</td>
181
+ <td>62.8</td>
182
+ <td>73.8</td>
183
+ <td>54.5</td>
184
+ <td>71.2</td>
185
+ <td>41.9</td>
186
+ <td>25.0</td>
187
+ <td>25.2</td>
188
+ <td>50.6</td>
189
+ </tr>
190
+ <tr>
191
+ <td>LLAVA-med-34B</td>
192
+ <td>48.9</td>
193
+ <td>58.6</td>
194
+ <td>44.4</td>
195
+ <td>67.3</td>
196
+ <td>48.8</td>
197
+ <td>13.0</td>
198
+ <td>16.4</td>
199
+ <td>42.5</td>
200
+ </tr>
201
+ <tr>
202
+ <td>HuatuoGPT-vision-34B</td>
203
+ <td>54.3</td>
204
+ <td>61.4</td>
205
+ <td>56.6</td>
206
+ <td>69.5</td>
207
+ <td>44.4</td>
208
+ <td>21.0</td>
209
+ <td>17.3</td>
210
+ <td>46.4</td>
211
+ </tr>
212
+ <tr>
213
+ <td>Lingshu-32B</td>
214
+ <td>62.3</td>
215
+ <td><u>76.5</u></td>
216
+ <td><u>57.9</u></td>
217
+ <td><strong>89.2</strong></td>
218
+ <td><strong>65.9</strong></td>
219
+ <td>17.0</td>
220
+ <td><strong>30.9</strong></td>
221
+ <td><u>57.1</u></td>
222
+ </tr>
223
+ <tr>
224
+ <td><strong>PulseMind-32B</strong></td>
225
+ <td><u>64.6</u></td>
226
+ <td><strong>83.2</strong></td>
227
+ <td><strong>68.1</strong></td>
228
+ <td><u>81.5</u></td>
229
+ <td><u>62.0</u></td>
230
+ <td><strong>32.0</strong></td>
231
+ <td><u>29.6</u></td>
232
+ <td><strong>60.1</strong></td>
233
+ </tr>
234
+ </tbody>
235
+ </table>
236
+
237
+
238
+ ### Medical Textual QA
239
+
240
+ <table>
241
+ <thead>
242
+ <tr>
243
+ <th>Models</th>
244
+ <th>MMLU-Med</th>
245
+ <th>MedMCQA</th>
246
+ <th>MedQA</th>
247
+ <th>MedXpertQA</th>
248
+ <th>Avg.</th>
249
+ </tr>
250
+ </thead>
251
+ <tbody>
252
+ <tr>
253
+ <td colspan="9" style="text-align:center;"><strong>Proprietary Models</strong></td>
254
+ </tr>
255
+ <tr>
256
+ <td>GPT-4o</td>
257
+ <td>88.7</td>
258
+ <td>73.5</td>
259
+ <td>55.7</td>
260
+ <td>22.5</td>
261
+ <td>60.1</td>
262
+ </tr>
263
+ <tr>
264
+ <td>o1</td>
265
+ <td>91.6</td>
266
+ <td>82.7</td>
267
+ <td>86.6</td>
268
+ <td>48.9</td>
269
+ <td>77.5</td>
270
+ </tr>
271
+ <tr>
272
+ <td>Gemini-2.5-Pro</td>
273
+ <td>89.8</td>
274
+ <td>68.6</td>
275
+ <td>85.6</td>
276
+ <td>24.3</td>
277
+ <td>67.1</td>
278
+ </tr>
279
+ <tr>
280
+ <td colspan="9" style="text-align:center;"><strong>Open-source Models (&asymp;72B)</strong></td>
281
+ </tr>
282
+ <tr>
283
+ <td>InternVL3-78B</td>
284
+ <td>83.0</td>
285
+ <td>66.1</td>
286
+ <td><u>93.3</u></td>
287
+ <td><u>18.5</u></td>
288
+ <td>65.2</td>
289
+ </tr>
290
+ <tr>
291
+ <td>Qwen2.5VL-72B</td>
292
+ <td><u>88.3</u></td>
293
+ <td><u>67.2</u></td>
294
+ <td>91.3</td>
295
+ <td>16.1</td>
296
+ <td><u>65.7</u></td>
297
+ </tr>
298
+ <tr>
299
+ <td><strong>PulseMind-72B</strong></td>
300
+ <td><strong>88.7</strong></td>
301
+ <td><strong>71.3</strong></td>
302
+ <td><strong>94.8</strong></td>
303
+ <td><strong>29.8</strong></td>
304
+ <td><strong>71.2</strong></td>
305
+ </tr>
306
+ <tr>
307
+ <td colspan="9" style="text-align:center;"><strong>Open-source Models (&gt;10B)</strong></td>
308
+ </tr>
309
+ <tr>
310
+ <td>InternVL3-38B</td>
311
+ <td>82.8</td>
312
+ <td>64.9</td>
313
+ <td>73.5</td>
314
+ <td>16.0</td>
315
+ <td>59.3</td>
316
+ </tr>
317
+ <tr>
318
+ <td>Qwen2.5VL-32B</td>
319
+ <td>83.2</td>
320
+ <td>63.0</td>
321
+ <td>71.6</td>
322
+ <td>15.6</td>
323
+ <td>58.4</td>
324
+ </tr>
325
+ <tr>
326
+ <td>LLAVA-med-34B</td>
327
+ <td>74.7</td>
328
+ <td>52.2</td>
329
+ <td>63.5</td>
330
+ <td>14.1</td>
331
+ <td>51.1</td>
332
+ </tr>
333
+ <tr>
334
+ <td>HuatuoGPT-vision-34B</td>
335
+ <td>80.8</td>
336
+ <td>63.6</td>
337
+ <td>57.4</td>
338
+ <td>16.0</td>
339
+ <td>54.5</td>
340
+ </tr>
341
+ <tr>
342
+ <td>Lingshu-32B</td>
343
+ <td><u>84.7</u></td>
344
+ <td><u>66.1</u></td>
345
+ <td><u>74.7</u></td>
346
+ <td><strong>22.7</strong></td>
347
+ <td><u>62.1</u></td>
348
+ </tr>
349
+ <tr>
350
+ <td><strong>PulseMind-32B</strong></td>
351
+ <td><strong>85.6</strong></td>
352
+ <td><strong>66.4</strong></td>
353
+ <td><strong>92.9</strong></td>
354
+ <td><u>21.5</u></td>
355
+ <td><strong>66.6</strong></td>
356
+ </tr>
357
+ </tbody>
358
+ </table>
359
+
360
+
361
+ ### Usage
362
+ ```python
363
+ from vllm import LLM, SamplingParams
364
+ from transformers import AutoProcessor
365
+ from qwen_vl_utils import process_vision_info
366
+ import PIL.Image as Image
367
+
368
+ MODEL_ID = "AQ-MedAI/PulseMind-72B"
369
+
370
+ # Load processor
371
+ processor = AutoProcessor.from_pretrained(MODEL_ID)
372
+
373
+ # Load vLLM engine
374
+ llm = LLM(
375
+ model=MODEL_ID,
376
+ limit_mm_per_prompt={"image": 4},
377
+ tensor_parallel_size=2,
378
+ enforce_eager=True,
379
+ trust_remote_code=True,
380
+ )
381
+
382
+ sampling_params = SamplingParams(
383
+ temperature=0.1,
384
+ top_k=1,
385
+ top_p=0.001,
386
+ repetition_penalty=1.05,
387
+ max_tokens=2048,
388
+ stop_token_ids=[],
389
+ )
390
+
391
+ # Example input
392
+ image = Image.open("example.png")
393
+ text = "Describe the image and provide relevant clinical observations."
394
+
395
+ messages = [
396
+ {
397
+ "role": "user",
398
+ "content": [
399
+ {"type": "image", "image": image},
400
+ {"type": "text", "text": text},
401
+ ],
402
+ }
403
+ ]
404
+
405
+ # Build prompt & multimodal inputs
406
+ prompt = processor.apply_chat_template(
407
+ messages,
408
+ tokenize=False,
409
+ add_generation_prompt=True,
410
+ )
411
+ image_inputs, video_inputs = process_vision_info(messages)
412
+
413
+ mm_data = {}
414
+ if image_inputs is not None:
415
+ mm_data["image"] = image_inputs
416
+ if video_inputs is not None:
417
+ mm_data["video"] = video_inputs
418
+
419
+ outputs = llm.generate(
420
+ [{"prompt": prompt, "multi_modal_data": mm_data}],
421
+ sampling_params=sampling_params,
422
+ )
423
+
424
+ print(outputs[0].outputs[0].text)
425
+ ```
426
+ Evaluation Scripts (Full Paths)
427
+ For complete evaluation pipelines, please refer to:
428
+ <p align="left">
429
+ <a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-CMtMedQA.py" target="_blank" rel="noopener">test-CMtMedQA</a>
430
+ &nbsp;&nbsp;
431
+ <a href="https://github.com/AQ-MedAI/PulseMind/blob/main/Benchmark/Eval/test-MedDiagnose.py" target="_blank" rel="noopener">test-MedDiagnose</a>
432
+ &nbsp;&nbsp;
433
+ </p>
434
+
435
+
436
+
437
+ ## Citation
438
+
439
+ If you find our project useful, we hope you would kindly star our repo and cite our work as follows:
440
+
441
+ ```
442
+ @article{xu2026pulsemind,
443
+ title={PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis},
444
+ author={Xu, Jiao and Liu, Junwei and Lao, Jiangwei and Zhu, Qi and Zhao, Yunpeng and Jin, Congyun and Liu, Shinan and Lu, Zhihong and Zhang, Lihe and Chen, Xin and others},
445
+ journal={arXiv preprint arXiv:2601.07344},
446
+ year={2026}
447
+ }
448
+ ```