QwenQKing commited on
Commit
44f45bf
·
verified ·
1 Parent(s): 995e875

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -1
README.md CHANGED
@@ -1 +1,144 @@
1
- Coming soon
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Prompt-R1: Enhancing LLM interaction on behalf of humans
2
+
3
+ <div align="center">
4
+
5
+ [![arXiv](https://img.shields.io/badge/arXiv-2511.01016-b31b1b.svg)](https://arxiv.org/abs/2511.01016)
6
+ [![Homepage](https://img.shields.io/badge/Homepage-Prompt--R1-blue.svg)](https://qwenqking.github.io/Prompt-R1/)
7
+ [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717.svg?logo=github)](https://github.com/QwenQKing/Prompt-R1)
8
+ [![Dataset](https://img.shields.io/badge/Dataset-HuggingFace-orange.svg)](https://huggingface.co/datasets/QwenQKing/Prompt-R1)
9
+
10
+
11
+ ### **Prompt-R1**: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning
12
+
13
+ [📄 Paper](https://arxiv.org/abs/2511.01016) | [🚀 Quick Start](#quick-start-prompt-r1) | [💬 Contact](mailto:wenjinliu23@outlook.com)
14
+
15
+ </div>
16
+
17
+ ---
18
+
19
+ ## Overview
20
+
21
+ <div align="center">
22
+ <img src="figs/fig1.png" width="80%"/>
23
+ </div>
24
+
25
+ **Prompt-R1** has addressed a critical challenge in interacting with large language models (LLMs)—the inability of users to provide accurate and effective interaction prompts for complex tasks. **Prompt-R1** is an **end-to-end reinforcement learning (RL)** framework that enhances the performance of LLMs by facilitating **collaborative automatic prompting** between a small-scale LLM and a large-scale LLM. **Prompt-R1**, through **multi-turn prompt interaction**, significantly improves the generation quality and reasoning accuracy of large-scale LLMs, enabling better task-solving performance without requiring user expertise in prompt formulation.
26
+
27
+
28
+
29
+ <div align="center">
30
+ <img src="static/images/1-overview.png" width="90%"/>
31
+ </div>
32
+
33
+ By integrating **collaborative prompting** and **reinforcement learning**, **Prompt-R1** offers a **plug-and-play framework** that supports both **inference** and **training** with **various large-scale LLMs** as the environment.
34
+
35
+ ## Experimental Results
36
+ **Results of Different Large language models:**
37
+ <div align="center">
38
+ <img src="figs/fig3.png" width="100%"/>
39
+ </div>
40
+
41
+
42
+
43
+ ## Prompt-R1 Implementation
44
+
45
+ ### Install Environment
46
+ ```bash
47
+ conda create -n promptr1 python==3.12 -y
48
+ conda activate promptr1
49
+ cd verl
50
+ pip3 install -e .
51
+ pip3 install vllm==0.8.3
52
+ pip3 install flash-attn==2.7.4.post1 # Download: https://github.com/Dao-AILab/flash-attention/releases
53
+ pip3 install FlagEmbedding faiss-cpu
54
+ pip3 install debugpy==1.8.0 "ray[default]" debugpy
55
+ ```
56
+
57
+ ### Dataset Preparation
58
+ >Our datasets are in:
59
+ ```bash
60
+ Training Dataset: dataset\train_data
61
+ Evaluation Dataset: dataset\eval_data
62
+ ```
63
+
64
+ ### Quick Start: Prompt-R1
65
+
66
+
67
+ ### 1. To use closed source LLM, modify promptr1_agent\tool\tools\LLM-toolpy:
68
+ ```bash
69
+ API_KEY = "your_api_key"
70
+ MODEL = "model_name"
71
+ BASE_URL = "url"
72
+ ```
73
+ >Run:
74
+ ```bash
75
+ nohup bash run_prompt-R1.sh > Prompt-R1_training.out &
76
+ ```
77
+
78
+
79
+ ### 2. Deploy an Open-Source Model Locally
80
+ #### 1. Install vLLM and dependencies
81
+ ```bash
82
+ # Create environment
83
+ conda create -n vllmapi python=3.12 -y
84
+ conda activate vllmapi
85
+ # Install dependencies
86
+ pip3 install transformers accelerate huggingface_hub
87
+ pip3 install vllm
88
+ ```
89
+
90
+ #### 2. Start the OpenAI-compatible server:
91
+ ```bash
92
+ nohup bash vllm_api.sh > api.out 2>&1 &
93
+ ```
94
+ #### 3. To use closed source LLM, modify promptr1_agent\tool\tools\LLM-toolpy to call your local API:
95
+ >Edit agent_r1/tool/tools/search_tool.py and set the local API endpoint and model name
96
+ ```bash
97
+ base_url = "http://<SERVER_IP>:8006/v1"
98
+ ```
99
+
100
+ ### Evaluation
101
+ #### 1.Edit model_merge.sh and set the paths:
102
+ ```bash
103
+ export CHECKPOINT_DIR='checkpoints/Prompt-R1/Prompt-R1-qwen3-4b-gpt-4o-mini/global_step_320/actor'
104
+ export HF_MODEL_PATH='./Qwen/Qwen3-4B'
105
+ export TARGET_DIR='./merge_model/Prompt-R1_Qwen3-4B'
106
+ ```
107
+
108
+ #### 2.Edit vllm_serve.sh:
109
+ ```bash
110
+ export MODEL_NAME='./merge_model/Prompt-R1_Qwen3-4B'
111
+ ```
112
+
113
+ #### 3.Inference
114
+ ```bash
115
+ python inference.py
116
+ ```
117
+
118
+ #### 4.Batch inference & Evaluation
119
+ ```bash
120
+ python batch_inference.py
121
+ python eval_scores.py
122
+ ```
123
+
124
+ ## BibTex
125
+
126
+ If you find this work is helpful for your research, please cite:
127
+
128
+ ```bibtex
129
+ @misc{liu2025promptr1collaborativeautomaticprompting,
130
+ title={Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning},
131
+ author={Wenjin Liu and Haoran Luo and Xueyuan Lin and Haoming Liu and Tiesunlong Shen and Jiapu Wang and Rui Mao and Erik Cambria},
132
+ year={2025},
133
+ eprint={2511.01016},
134
+ archivePrefix={arXiv},
135
+ primaryClass={cs.CL},
136
+ url={https://arxiv.org/abs/2511.01016},
137
+ }
138
+ ```
139
+
140
+ For further questions, please contact: wenjinliu23@outlook.com.
141
+
142
+ ## Acknowledgement
143
+
144
+ This repo benefits from [Agent-R1](https://github.com/0russwest0/Agent-R1), [R1-Searcher](https://github.com/RUCAIBox/R1-Searcher), [Graph-R1](https://github.com/LHRLAB/Graph-R1), and [Search-R1](https://github.com/RUCAIBox/R1-Searcher). Thanks for their wonderful works.