Update README.md
Browse files
README.md
CHANGED
|
@@ -8,11 +8,12 @@ MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed T
|
|
| 8 |
|
| 9 |
> Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
|
| 10 |
|
|
|
|
|
|
|
| 11 |
> Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges—the MedBrowseComp benchmark reveals even GPT-o3 deep research, the leading proprietary deep research system, achieves only 25.5% accuracy on complex medical queries. The key limitations are: (1) insufficient dense medical knowledge for clinical reasoning, and (2) lack of medical-specific retrieval tools. We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting longest chains from subgraphs around rare medical entities to generate complex multi-hop QA pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2,100 diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions. Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our open-source 32B model achieves competitive performance on general benchmarks (GAIA: 53.4, xBench: 54), comparable to GPT-4o-mini, while outperforming significantly larger proprietary models. More importantly, we establish new state-of-the-art on MedBrowseComp with 27.5% accuracy, surpassing leading closed-source deep research systems including O3 deepresearch, substantially advancing medical deep research capabilities. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains. Code and datasets will be released to facilitate further research.
|
| 12 |
|
| 13 |
✍️ Citation
|
| 14 |
-
|
| 15 |
-
@article{2025,
|
| 16 |
title={MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework},
|
| 17 |
author={Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu},
|
| 18 |
journal={arXiv preprint},
|
|
@@ -21,4 +22,4 @@ MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed T
|
|
| 21 |
```
|
| 22 |
|
| 23 |
📜 License
|
| 24 |
-
|
|
|
|
| 8 |
|
| 9 |
> Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
|
| 10 |
|
| 11 |
+
For more details: comming soon...
|
| 12 |
+
|
| 13 |
> Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges—the MedBrowseComp benchmark reveals even GPT-o3 deep research, the leading proprietary deep research system, achieves only 25.5% accuracy on complex medical queries. The key limitations are: (1) insufficient dense medical knowledge for clinical reasoning, and (2) lack of medical-specific retrieval tools. We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting longest chains from subgraphs around rare medical entities to generate complex multi-hop QA pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2,100 diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions. Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our open-source 32B model achieves competitive performance on general benchmarks (GAIA: 53.4, xBench: 54), comparable to GPT-4o-mini, while outperforming significantly larger proprietary models. More importantly, we establish new state-of-the-art on MedBrowseComp with 27.5% accuracy, surpassing leading closed-source deep research systems including O3 deepresearch, substantially advancing medical deep research capabilities. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains. Code and datasets will be released to facilitate further research.
|
| 14 |
|
| 15 |
✍️ Citation
|
| 16 |
+
```@article{2025,
|
|
|
|
| 17 |
title={MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework},
|
| 18 |
author={Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu},
|
| 19 |
journal={arXiv preprint},
|
|
|
|
| 22 |
```
|
| 23 |
|
| 24 |
📜 License
|
| 25 |
+
MedReseacher-R1 is licensed under the Apache 2.0.
|