Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,110 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language: en
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- conversational
|
| 7 |
+
- text-generation
|
| 8 |
+
- medical
|
| 9 |
+
- diagnosis
|
| 10 |
+
- agent
|
| 11 |
+
- reinforcement-learning
|
| 12 |
+
base_model: Qwen3-8B
|
| 13 |
+
datasets:
|
| 14 |
+
- HealthBench
|
| 15 |
+
- MAQuE
|
| 16 |
+
- MedQA
|
| 17 |
+
- MMLU
|
| 18 |
+
paper: 2510.04284
|
| 19 |
+
model_name: Doctor-R1
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
# Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
|
| 23 |
+
|
| 24 |
+
**Doctor-R1** is an AI doctor agent trained to conduct strategic, multi-turn patient inquiries to guide its diagnostic decision-making. Unlike traditional models that excel at static medical QA, Doctor-R1 is designed to master the complete, dynamic consultation process, unifying the two core skills of a human physician: communication and decision-making.
|
| 25 |
+
|
| 26 |
+
This model is an 8B parameter agent built upon **Qwen3-8B** and fine-tuned using a novel **Experiential Agentic Reinforcement Learning** framework.
|
| 27 |
+
|
| 28 |
+
## ✨ Key Features
|
| 29 |
+
|
| 30 |
+
* **Unified Clinical Skills:** The first agent framework to holistically integrate two core clinical skills, **strategic patient inquiry** and **accurate medical decision-making** within a single model.
|
| 31 |
+
* **Experiential Reinforcement Learning:** A novel closed-loop framework where the agent learns and improves from an accumulating repository of its own high-quality experiences.
|
| 32 |
+
* **Dual-Competency Reward System:** A sophisticated two-tiered reward architecture that separately optimizes for both conversational quality (soft skills) and diagnostic accuracy (hard skills), featuring a "safety-first" veto system.
|
| 33 |
+
* **State-of-the-Art Performance:** Outperforms leading open-source models on challenging dynamic benchmarks like HealthBench and MAQuE with high parameter efficiency (8B).
|
| 34 |
+
|
| 35 |
+
## 🏆 Leaderboards
|
| 36 |
+
|
| 37 |
+
Doctor-R1 demonstrates state-of-the-art performance among open-source models and surpasses several powerful proprietary models on HealthBench. It demonstrates superior performance on dynamic benchmarks and strong foundational knowledge on static QA tasks.
|
| 38 |
+
|
| 39 |
+
| Benchmark | Key Metric | Doctor-R1 | Best Open-Source (>=32B) |
|
| 40 |
+
| :----------------- | :--------- | :-------: | :----------------------: |
|
| 41 |
+
| **HealthBench** | Avg. Score | **36.29** | 33.16 |
|
| 42 |
+
| **MAQuE** | Accuracy | **60.00** | 57.00 |
|
| 43 |
+
| **MedQA** | Accuracy | **83.50** | 81.50 |
|
| 44 |
+
| **MMLU (Medical)** | Accuracy | **85.00** | 84.00 |
|
| 45 |
+
|
| 46 |
+
The detailed breakdown of **HealthBench Main (Dynamic Consultation)** is as below:
|
| 47 |
+
|
| 48 |
+
| Model | Avg. Score | Accuracy | Comm. Quality | Context Aware. |
|
| 49 |
+
| :------------------------ | :--------: | :-------: | :-----------: | :------------: |
|
| 50 |
+
| **GPT-o3** (Proprietary) | 38.91 | 40.31 | 64.78 | 48.09 |
|
| 51 |
+
| **Doctor-R1 (8B)** | **36.29** | **37.84** | **64.15** | **49.24** |
|
| 52 |
+
| Baichuan-M2-32B | 33.16 | 33.95 | 58.01 | 46.80 |
|
| 53 |
+
| Grok-4 (Proprietary) | 33.03 | 37.95 | 61.35 | 45.62 |
|
| 54 |
+
| GPT-4.1 (Proprietary) | 31.18 | 34.78 | 60.65 | 44.81 |
|
| 55 |
+
| UltraMedical-8B | 22.19 | 25.50 | 57.40 | 40.26 |
|
| 56 |
+
| **Base Model (Qwen3-8B)** | 25.13 | 28.57 | 49.35 | 43.00 |
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
## 👥 Human Evaluation
|
| 61 |
+
|
| 62 |
+
To validate that our quantitative results align with user experience, we conducted a pairwise human preference evaluation against other leading models. The results show a decisive preference for Doctor-R1, especially in patient-centric metrics.
|
| 63 |
+
|
| 64 |
+

|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
## 🔬 Ablation Studies
|
| 69 |
+
|
| 70 |
+
Our ablation studies validate the critical contributions of our framework's key components.
|
| 71 |
+
|
| 72 |
+
***Impact of Experience Retrieval Mechanism.*** The results show that our full retrieval mechanism with reward and novelty filtering provides a significant performance boost over both a no-experience baseline and a standard similarity-based retrieval, especially in communication skills.
|
| 73 |
+
|
| 74 |
+
<p align="center">
|
| 75 |
+
<img src="assets/radar_exp.jpg" style="width:60%;" />
|
| 76 |
+
</p>
|
| 77 |
+
|
| 78 |
+
***Impact of Patient Agent Scaling.*** We observe a strong, positive correlation between the number of simulated patient interactions during training and the agent's final performance. This validates that our agentic framework effectively learns and improves from a large volume of diverse experiences.
|
| 79 |
+
|
| 80 |
+

|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
## 📜 Citation
|
| 86 |
+
|
| 87 |
+
If you find our work useful in your research, please consider citing our paper:
|
| 88 |
+
|
| 89 |
+
```bibtex
|
| 90 |
+
@misc{lai2025doctorr1masteringclinicalinquiry,
|
| 91 |
+
title={Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning},
|
| 92 |
+
author={Yunghwei Lai and Kaiming Liu and Ziyue Wang and Weizhi Ma and Yang Liu},
|
| 93 |
+
year={2025},
|
| 94 |
+
eprint={2510.04284},
|
| 95 |
+
archivePrefix={arXiv},
|
| 96 |
+
primaryClass={cs.AI},
|
| 97 |
+
url={https://arxiv.org/abs/2510.04284},
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
## 💬 Contact & Questions
|
| 105 |
+
|
| 106 |
+
For collaborations or inquiries, please contact [**laiyunghwei@gmail.com**](mailto:laiyunghwei@gmail.com). You’re also welcome to open an issue or join the discussion in this repository, we value your insights and contributions to **Doctor-R1**.
|
| 107 |
+
|
| 108 |
+
Stay tuned and join our community as we push the boundaries of intelligent healthcare. Together, let’s make medical AI safer, smarter, and more human. 🤝
|
| 109 |
+
|
| 110 |
+
|