Jarvis1111 nielsr HF Staff commited on
Commit
9502d49
·
verified ·
1 Parent(s): 21b1c9e

Improve model card: Update pipeline tag, add `transformers` library, and enhance content with paper/code links (#1)

Browse files

- Improve model card: Update pipeline tag, add `transformers` library, and enhance content with paper/code links (5576f621a151b209fd0345a5230c29c37de17108)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +123 -5
README.md CHANGED
@@ -1,10 +1,128 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - Qwen/Qwen2.5-7B-Instruct
7
- pipeline_tag: question-answering
 
 
 
8
  tags:
9
  - medical
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-7B-Instruct
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
  tags:
9
  - medical
10
+ library_name: transformers
11
+ paper: "2505.19630"
12
+ ---
13
+
14
+ # DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue
15
+
16
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.19630-b31b1b.svg)](https://huggingface.co/papers/2505.19630) [![GitHub](https://img.shields.io/badge/GitHub-Code-blue.svg?logo=github)](https://github.com/JarvisUSTC/DoctorAgent-RL) [![Hugging Face Collection](https://img.shields.io/badge/Hugging%20Face%20Collection-doctoragent--rl-blue)](https://huggingface.co/collections/Jarvis1111/doctoragent-rl-684ffbcade52305ba0e3e97f)
17
+
18
+ <div align="center">
19
+ <img width="1231" alt="DoctorAgent-RL Overview" src="https://github.com/user-attachments/assets/bd9f676e-01f9-406c-881d-c2b9f45e62f3" />
20
+ </div>
21
+
22
+ DoctorAgent-RL is a novel reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty. It addresses core challenges faced by LLMs in real-world clinical consultations, such as vague diagnoses from single-round systems and the inflexibility of traditional multi-turn dialogue models constrained by static supervised learning.
23
+
24
+ In DoctorAgent-RL, a doctor agent continuously optimizes its questioning strategy within an RL framework through multi-turn interactions with a patient agent. This dynamic adjustment of information-gathering paths is guided by comprehensive rewards from a Consultation Evaluator. This RL fine-tuning mechanism enables LLMs to autonomously develop interaction strategies aligned with clinical reasoning logic, moving beyond superficial imitation of patterns in existing dialogue data. The work also introduces MTMedDialog, the first English multi-turn medical consultation dataset capable of simulating patient interactions.
25
+
26
+ Experiments demonstrate that DoctorAgent-RL outperforms existing models in both multi-turn reasoning capability and final diagnostic performance, showing immense practical value in reducing misdiagnosis risks and optimizing medical resource allocation.
27
+
28
+ ## Key Features
29
+
30
+ * **Multi-Agent Collaboration**: Features distinct Doctor and Patient agents with specific roles and objectives.
31
+ * **Dynamic Strategy Optimization**: Leverages reinforcement learning for continuous policy updates and adaptive dialogue behavior.
32
+ * **Comprehensive Reward Design**: Guides optimal strategies through multi-dimensional consultation evaluation metrics.
33
+ * **Medical Knowledge Integration**: Embeds clinical reasoning logic directly into decision-making processes.
34
+ * **MTMedDialog Dataset**: Introduces the first English multi-turn medical consultation dataset designed for simulation capabilities.
35
+
36
+ ## Methodology
37
+
38
+ <div align="center">
39
+ <img src="https://github.com/JarvisUSTC/DoctorAgent-RL/blob/main/Figures/framework.png?raw=true" alt="System Architecture" width="600">
40
+ </div>
41
+
42
+ The DoctorAgent-RL framework comprises three core interacting components: a **Doctor Agent** for diagnostic reasoning and question formulation, a **Patient Agent** simulating patient responses, and a **Consultation Evaluator** providing multi-dimensional reward signals to assess consultation quality. This continuous learning loop refines interaction strategies through iterative interactions and policy updates.
43
+
44
+ ## How to Use
45
+
46
+ This model is built on the `Qwen/Qwen2.5-7B-Instruct` base model and is designed to be compatible with the Hugging Face `transformers` library.
47
+
48
+ To use the DoctorAgent-RL model for multi-turn clinical dialogue, you can load it as follows:
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+ import torch
53
+
54
+ # Load the model and tokenizer
55
+ model_name = "Jarvis1111/DoctorAgent-RL"
56
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
57
+ model = AutoModelForCausalLM.from_pretrained(
58
+ model_name,
59
+ torch_dtype=torch.bfloat16, # Use appropriate dtype (e.g., torch.float16 or torch.float32)
60
+ device_map="auto" # Automatically maps the model to available devices (e.g., GPU)
61
+ )
62
+
63
+ # Function to generate response based on conversation history
64
+ def get_doctor_response(conversation_history):
65
+ # Apply the chat template to format the conversation
66
+ text = tokenizer.apply_chat_template(
67
+ conversation_history,
68
+ tokenize=False,
69
+ add_generation_prompt=True
70
+ )
71
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
72
+
73
+ # Generate the response
74
+ generated_ids = model.generate(
75
+ **inputs,
76
+ max_new_tokens=512, # Maximum length of the generated response
77
+ do_sample=True,
78
+ temperature=0.7, # Controls creativity (higher = more creative)
79
+ top_k=20, # Considers top-k most likely next tokens
80
+ top_p=0.8, # Filters tokens by cumulative probability
81
+ pad_token_id=tokenizer.pad_token_id, # Use tokenizer's pad token id (151643 for <|endoftext|>)
82
+ eos_token_id=[tokenizer.eos_token_id, tokenizer.pad_token_id] # Both <|im_end|> (151645) and <|endoftext|> (151643)
83
+ )
84
+
85
+ # Decode the generated tokens
86
+ # Remove the input tokens to get only the new response
87
+ generated_ids = generated_ids[0, inputs.input_ids.shape[1]:]
88
+ response = tokenizer.decode(generated_ids, skip_special_tokens=True)
89
+ return response
90
+
91
+ # Example multi-turn clinical dialogue
92
+ conversation = []
93
+
94
+ # Turn 1: Patient describes symptoms
95
+ patient_input_1 = "I have a persistent cough and a sore throat. It started about three days ago."
96
+ conversation.append({"role": "user", "content": patient_input_1})
97
+ print(f"Patient: {patient_input_1}")
98
+
99
+ doctor_response_1 = get_doctor_response(conversation)
100
+ conversation.append({"role": "assistant", "content": doctor_response_1})
101
+ print(f"Doctor: {doctor_response_1}")
102
+
103
+ # Turn 2: Patient responds to doctor's follow-up
104
+ patient_input_2 = "Yes, I also feel quite fatigued and have a mild headache, especially behind my eyes."
105
+ conversation.append({"role": "user", "content": patient_input_2})
106
+ print(f"Patient: {patient_input_2}")
107
+
108
+ doctor_response_2 = get_doctor_response(conversation)
109
+ conversation.append({"role": "assistant", "content": doctor_response_2})
110
+ print(f"Doctor: {doctor_response_2}")
111
+
112
+ # Continue the conversation as needed to reach a diagnosis or provide advice.
113
+ ```
114
+
115
+ For more detailed setup instructions, training scripts, and experimentation, please refer to the [official GitHub repository](https://github.com/JarvisUSTC/DoctorAgent-RL).
116
+
117
+ ## Citation
118
+
119
+ If DoctorAgent-RL contributes to your research, please consider citing our work:
120
+
121
+ ```bibtex
122
+ @article{feng2025doctoragent,
123
+ title={DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue},
124
+ author={Feng, Yichun and Wang, Jiawei and Zhou, Lu and Li, Yixue},
125
+ journal={arXiv preprint arXiv:2505.19630},
126
+ year={2025}
127
+ }
128
+ ```