Improve model card: Add pipeline tag, library name, and direct links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +101 -44
README.md CHANGED
@@ -1,16 +1,21 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - Qwen/Qwen2.5-7B-Instruct
 
 
 
7
  tags:
8
  - medical
9
  - diagnosis
10
  - RL
 
 
11
  ---
 
12
  # DiagAgent-7B: RL-Optimized Diagnostic Agent
13
 
 
 
14
  <div align="center">
15
  <img src="https://raw.githubusercontent.com/MAGIC-AI4Med/DiagGym/main/assets/logo.png" width="150"/>
16
  <div align="center"></div>
@@ -24,7 +29,7 @@ DiagAgent‑7B is a reinforcement learning‑optimized large language model for
24
 
25
  DiagAgent‑7B is trained end‑to‑end inside the `DiagGym` virtual clinical environment with multi‑turn RL (GRPO), enabling safe, closed‑loop learning without real‑world risk.
26
 
27
- Details can be found in our paper https://arxiv.org/abs/2510.24654
28
 
29
  ## Quickstart
30
 
@@ -58,27 +63,45 @@ def chat(messages, max_new_tokens=1024, temperature=0.0):
58
 
59
  SYSTEM_PROMPT = (
60
  "You are a medical AI assistant. Analyze patient information, suggest relevant tests, "
61
- "and provide a final diagnosis when sufficient information is available.\n\n"
62
- "RESPONSE FORMAT:\n"
63
- "If more information is needed:\n"
64
- "```\n"
65
- "Current diagnosis: <your current best diagnosis>\n"
66
- "Based on the patient's initial presentation, the following investigation(s) should be performed: <one additional test>\n"
67
- "Reason: <reason for the test>\n"
68
- "```\n"
69
- "If sufficient information exists for diagnosis:\n"
70
- "```\n"
71
- "The available information is sufficient to make a diagnosis.\n"
72
- "Diagnosis: <final diagnosis>\n"
73
- "Reason: <brief justification>\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  "```"
75
  )
76
 
77
  initial_inquiry = (
78
- "- Patient Information: ___ y/o F\n"
79
- "- Chief Complaint: Early satiety, weight loss, abdominal pain\n"
80
- "- HPI: 1-month weight loss (10 lbs), early satiety, fatigue; prior emesis; reduced intake; denies fever/chills.\n"
81
- "- PMH: Asthma, hyperlipidemia, HTN, osteoarthritis, polymyalgia rheumatica, CAD (NSTEMI), osteoporosis, H. pylori, s/p TAH/USO.\n"
 
 
 
 
82
  "- Allergy: Lisinopril."
83
  )
84
 
@@ -118,11 +141,9 @@ resp = client.chat.completions.create(
118
  print(resp.choices[0].message.content)
119
  ```
120
 
121
-
122
-
123
  ## Evaluation Results
124
 
125
- The following tables are taken directly from the project evaluation. For evaluation details and scripts, see the paper and the GitHub repository.
126
 
127
  **Single‑Turn Evaluation**
128
 
@@ -191,37 +212,73 @@ The following tables are taken directly from the project evaluation. For evaluat
191
  | MDAgent | - | 2024.10| 21.64 |
192
  | **Our Method** | | | |
193
  | DiagAgent-14B | 14B | - | 32.86 |
 
194
 
195
- ## Training Details
 
 
 
196
 
197
- DiagAgent‑7B is optimized with multi‑turn RL (GRPO) inside `DiagGym`.
 
198
 
199
- - Trajectory construction:
200
- - Initial inquiry (structured patient history without final diagnosis)
201
- - Iterative steps: preliminary diagnosis → recommended exam + rationale → exam result
202
- - Final diagnosis focused on a single primary condition
203
- - Reward design:
204
- - Diagnosis Accuracy (exact match)
205
- - Examination Recommendation F1 (overlap with reference EHR exams)
206
- - Turn Penalty (discourages >12 interaction turns)
207
 
208
- For implementation and scripts, see `DiagAgent/train/rl/` in the GitHub.
209
 
210
- ## Citation
211
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
  @misc{qiu2025evolvingdiagnosticagentsvirtual,
213
- title={Evolving Diagnostic Agents in a Virtual Clinical Environment},
214
  author={Pengcheng Qiu and Chaoyi Wu and Junwei Liu and Qiaoyu Zheng and Yusheng Liao and Haowen Wang and Yun Yue and Qianrui Fan and Shuai Zhen and Jian Wang and Jinjie Gu and Yanfeng Wang and Ya Zhang and Weidi Xie},
215
  year={2025},
216
  eprint={2510.24654},
217
  archivePrefix={arXiv},
218
  primaryClass={cs.CL},
219
- url={https://arxiv.org/abs/2510.24654},
220
  }
221
  ```
222
 
223
-
224
- ## Contact
225
-
226
- - Email: henrychur@sjtu.edu.cn
227
- - GitHub: https://github.com/MAGIC-AI4Med/DiagGym
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-7B-Instruct
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
  tags:
8
  - medical
9
  - diagnosis
10
  - RL
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
  ---
14
+
15
  # DiagAgent-7B: RL-Optimized Diagnostic Agent
16
 
17
+ 📄 [Paper](https://huggingface.co/papers/2510.24654) - 🌐 [Project Page](https://arxiv.org/html/2510.24654v1) - 💻 [Code](https://github.com/MAGIC-AI4Med/DiagGym)
18
+
19
  <div align="center">
20
  <img src="https://raw.githubusercontent.com/MAGIC-AI4Med/DiagGym/main/assets/logo.png" width="150"/>
21
  <div align="center"></div>
 
29
 
30
  DiagAgent‑7B is trained end‑to‑end inside the `DiagGym` virtual clinical environment with multi‑turn RL (GRPO), enabling safe, closed‑loop learning without real‑world risk.
31
 
32
+ Details can be found in our paper [Evolving Diagnostic Agents in a Virtual Clinical Environment](https://huggingface.co/papers/2510.24654).
33
 
34
  ## Quickstart
35
 
 
63
 
64
  SYSTEM_PROMPT = (
65
  "You are a medical AI assistant. Analyze patient information, suggest relevant tests, "
66
+ "and provide a final diagnosis when sufficient information is available.
67
+
68
+ "
69
+ "RESPONSE FORMAT:
70
+ "
71
+ "If more information is needed:
72
+ "
73
+ "```
74
+ "
75
+ "Current diagnosis: <your current best diagnosis>
76
+ "
77
+ "Based on the patient's initial presentation, the following investigation(s) should be performed: <one additional test>
78
+ "
79
+ "Reason: <reason for the test>
80
+ "
81
+ "```
82
+ "
83
+ "If sufficient information exists for diagnosis:
84
+ "
85
+ "```
86
+ "
87
+ "The available information is sufficient to make a diagnosis.
88
+ "
89
+ "Diagnosis: <final diagnosis>
90
+ "
91
+ "Reason: <brief justification>
92
+ "
93
  "```"
94
  )
95
 
96
  initial_inquiry = (
97
+ "- Patient Information: ___ y/o F
98
+ "
99
+ "- Chief Complaint: Early satiety, weight loss, abdominal pain
100
+ "
101
+ "- HPI: 1-month weight loss (10 lbs), early satiety, fatigue; prior emesis; reduced intake; denies fever/chills.
102
+ "
103
+ "- PMH: Asthma, hyperlipidemia, HTN, osteoarthritis, polymyalgia rheumatica, CAD (NSTEMI), osteoporosis, H. pylori, s/p TAH/USO.\
104
+ "\
105
  "- Allergy: Lisinopril."
106
  )
107
 
 
141
  print(resp.choices[0].message.content)
142
  ```
143
 
 
 
144
  ## Evaluation Results
145
 
146
+ The following tables are taken directly from the project evaluation. For evaluation details and scripts, see the paper and the [GitHub repository](https://github.com/MAGIC-AI4Med/DiagGym).
147
 
148
  **Single‑Turn Evaluation**
149
 
 
212
  | MDAgent | - | 2024.10| 21.64 |
213
  | **Our Method** | | | |
214
  | DiagAgent-14B | 14B | - | 32.86 |
215
+ ---
216
 
217
+ ## Model Training
218
+ ### 🏥 DiagGym — Virtual Clinical Environment
219
+ #### 📂 Data Construction
220
+ We build **DiagGym Training Dataset** from the MIMIC‑IV EHR dataset by reorganizing each patient record into:
221
 
222
+ - **Patient profile** extracted from discharge notes (physical exam, chief complaint, history, allergies, family/social history, discharge diagnosis)
223
+ - **Time‑ordered examination set** — chronologically sorted exams (lab, microbiology, radiology) linked with their results.
224
 
225
+ The pipeline includes filtering (removing cases without physical exams or with pre‑established diagnoses), standardizing exam names, filling missing labels, and restricting to exams performed within one day before admission to ensure diagnostic relevance.
 
 
 
 
 
 
 
226
 
227
+ <img src="https://raw.githubusercontent.com/MAGIC-AI4Med/DiagGym/main/assets/DiagGym_data_construction.png"/>
228
 
229
+ Following the pipeline above, we obtain 118,478 patient EHRs, covering 4,897 distinct diseases.
230
+ On average, each case contains 29 examinations (26 laboratory, 2 microbiology, 1 radiology).
231
+
232
+ > **Note on Data Availability**: The data source for this work is MIMIC-IV. Due to licensing restrictions, we are unable to directly open-source the processed dataset. However, we are actively communicating with the relevant parties regarding the possibility of making the dataset publicly available on [PhysioNet](https://physionet.org/).
233
+
234
+ #### ⚙️ Training Details
235
+ **DiagGym** is trained as a conditional generative "EHR world model" that, given a patient profile and past examinations, generates the result of the next requested examination.
236
+ We treat all exam results (textual or numeric) as free text and train with a standard token‑wise autoregressive loss.
237
+
238
+ <img src="https://raw.githubusercontent.com/MAGIC-AI4Med/DiagGym/main/assets/DiagGym_training.png"/>
239
+
240
+ For full training details and implementation code, see our [paper](https://huggingface.co/papers/2510.24654) and [training scripts](https://github.com/MAGIC-AI4Med/DiagGym/tree/main/DiagGym/train/).
241
+
242
+
243
+ ### 🤖 DiagAgent — RL‑Trained Diagnostic Agent
244
+ #### 📂 Data Construction
245
+ As shown in the figure below, we reformat DiagGym cases into **multi‑turn diagnostic trajectories** containing:
246
+ - An **initial inquiry** (structured patient history without the final diagnosis)
247
+ - Iterative steps of *preliminary diagnosis → recommended examination + rationale → exam result*
248
+ - A **final diagnosis** focused on a single primary condition
249
+
250
+ All trajectories are generated with DeepSeek‑v3 and filtered to prevent diagnosis leakage.
251
+
252
+ <img src="https://raw.githubusercontent.com/MAGIC-AI4Med/DiagGym/main/assets/DiagAgent_data_construction.png"/>
253
+
254
+ Following this pipeline, we obtain 16,270 interactive diagnostic trajectories
255
+
256
+ #### ⚙️ Training Details
257
+ DiagAgent is optimized with **end‑to‑end multi‑turn reinforcement learning (GRPO)** inside the DiagGym environment.
258
+ In each rollout, the agent starts from an initial inquiry, interacts with DiagGym by recommending examinations and receiving simulated results, and decides when to make the final diagnosis.
259
+
260
+ The reward combines three components:
261
+ - **Diagnosis Accuracy** — 1 if the predicted diagnosis matches the ground truth, else 0
262
+ - **Examination Recommendation F1** — overlap between recommended and reference exams from real EHRs
263
+ - **Turn Penalty** — discourages excessive interaction turns beyond the set limit (12)
264
+
265
+ <img src="https://raw.githubusercontent.com/MAGIC-AI4Med/DiagGym/main/assets/DiagAgent_training.png"/>
266
+
267
+ For full training details and implementation code, see our [paper](https://huggingface.co/papers/2510.24654) and [training scripts](https://github.com/MAGIC-AI4Med/DiagGym/tree/main/DiagAgent/train/rl/).
268
+
269
+
270
+ ## 📝 Citation & Contact
271
+
272
+ ```bibtex
273
  @misc{qiu2025evolvingdiagnosticagentsvirtual,
274
+ title={Evolving Diagnostic Agents in a Virtual Clinical Environment},
275
  author={Pengcheng Qiu and Chaoyi Wu and Junwei Liu and Qiaoyu Zheng and Yusheng Liao and Haowen Wang and Yun Yue and Qianrui Fan and Shuai Zhen and Jian Wang and Jinjie Gu and Yanfeng Wang and Ya Zhang and Weidi Xie},
276
  year={2025},
277
  eprint={2510.24654},
278
  archivePrefix={arXiv},
279
  primaryClass={cs.CL},
280
+ url={https://arxiv.org/abs/2510.24654},
281
  }
282
  ```
283
 
284
+ For any inquiries or feedback, don’t hesitate to contact henrychur@sjtu.edu.cn.