AQ-MedAI
/

MedResearcher-R1-32B

@@ -1,5 +1,5 @@
 ---
-license: apache-2.0
 base_model:
 - Qwen/Qwen2.5-32B-Instruct
 ---
@@ -8,6 +8,8 @@ this model is related to following work:
 ## MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework
 [![arXiv](https://img.shields.io/badge/arxiv-2508.14880-blue)](https://arxiv.org/abs/2508.14880)
 ### author list
 > Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
@@ -16,7 +18,7 @@ this model is related to following work:
 > Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges—the MedBrowseComp benchmark reveals even GPT-o3 deep research, the leading proprietary deep research system, achieves only 25.5% accuracy on complex medical queries. The key limitations are: (1) insufficient dense medical knowledge for clinical reasoning, and (2) lack of medical-specific retrieval tools. We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting longest chains from subgraphs around rare medical entities to generate complex multi-hop QA pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2,100 diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions. Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our open-source 32B model achieves competitive performance on general benchmarks (GAIA: 53.4, xBench: 54), comparable to GPT-4o-mini, while outperforming significantly larger proprietary models. More importantly, we establish new state-of-the-art on MedBrowseComp with 27.5% accuracy, surpassing leading closed-source deep research systems including O3 deepresearch, substantially advancing medical deep research capabilities. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains. Code and datasets will be released to facilitate further research.
 ## Run Evaluation
 ## ✍️Citation

 ---
+license: mit
 base_model:
 - Qwen/Qwen2.5-32B-Instruct
 ---
 ## MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework
 [![arXiv](https://img.shields.io/badge/arxiv-2508.14880-blue)](https://arxiv.org/abs/2508.14880)
+[![github](https://img.shields.io/badge/github-MedResearcher-orange)](https://github.com/AQ-MedAI/MedResearcher-R1)
+[!license](https://img.shields.io/badge/license-mit-white)](https://github.com/AQ-MedAI/MedResearcher-R1/blob/main/LICENSE)
 ### author list
 > Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
 > Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges—the MedBrowseComp benchmark reveals even GPT-o3 deep research, the leading proprietary deep research system, achieves only 25.5% accuracy on complex medical queries. The key limitations are: (1) insufficient dense medical knowledge for clinical reasoning, and (2) lack of medical-specific retrieval tools. We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting longest chains from subgraphs around rare medical entities to generate complex multi-hop QA pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2,100 diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions. Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our open-source 32B model achieves competitive performance on general benchmarks (GAIA: 53.4, xBench: 54), comparable to GPT-4o-mini, while outperforming significantly larger proprietary models. More importantly, we establish new state-of-the-art on MedBrowseComp with 27.5% accuracy, surpassing leading closed-source deep research systems including O3 deepresearch, substantially advancing medical deep research capabilities. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains. Code and datasets will be released to facilitate further research.
 ## Run Evaluation
+> If you would like to use our model for inference and evaluation, please refer to our GitHub repo [![github](https://img.shields.io/badge/github-MedResearcher-orange)](https://github.com/AQ-MedAI/MedResearcher-R1). We provide complete evaluation tools and code in the EvaluationPipeline so that you can verify the performance on some common rankings(such as gaia-103-text) or your own datasets.
 ## ✍️Citation