Update README.md
Browse files
README.md
CHANGED
|
@@ -1 +1,144 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Prompt-R1: Enhancing LLM interaction on behalf of humans
|
| 2 |
+
|
| 3 |
+
<div align="center">
|
| 4 |
+
|
| 5 |
+
[](https://arxiv.org/abs/2511.01016)
|
| 6 |
+
[](https://qwenqking.github.io/Prompt-R1/)
|
| 7 |
+
[](https://github.com/QwenQKing/Prompt-R1)
|
| 8 |
+
[](https://huggingface.co/datasets/QwenQKing/Prompt-R1)
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
### **Prompt-R1**: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning
|
| 12 |
+
|
| 13 |
+
[📄 Paper](https://arxiv.org/abs/2511.01016) | [🚀 Quick Start](#quick-start-prompt-r1) | [💬 Contact](mailto:wenjinliu23@outlook.com)
|
| 14 |
+
|
| 15 |
+
</div>
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Overview
|
| 20 |
+
|
| 21 |
+
<div align="center">
|
| 22 |
+
<img src="figs/fig1.png" width="80%"/>
|
| 23 |
+
</div>
|
| 24 |
+
|
| 25 |
+
**Prompt-R1** has addressed a critical challenge in interacting with large language models (LLMs)—the inability of users to provide accurate and effective interaction prompts for complex tasks. **Prompt-R1** is an **end-to-end reinforcement learning (RL)** framework that enhances the performance of LLMs by facilitating **collaborative automatic prompting** between a small-scale LLM and a large-scale LLM. **Prompt-R1**, through **multi-turn prompt interaction**, significantly improves the generation quality and reasoning accuracy of large-scale LLMs, enabling better task-solving performance without requiring user expertise in prompt formulation.
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
<div align="center">
|
| 30 |
+
<img src="static/images/1-overview.png" width="90%"/>
|
| 31 |
+
</div>
|
| 32 |
+
|
| 33 |
+
By integrating **collaborative prompting** and **reinforcement learning**, **Prompt-R1** offers a **plug-and-play framework** that supports both **inference** and **training** with **various large-scale LLMs** as the environment.
|
| 34 |
+
|
| 35 |
+
## Experimental Results
|
| 36 |
+
**Results of Different Large language models:**
|
| 37 |
+
<div align="center">
|
| 38 |
+
<img src="figs/fig3.png" width="100%"/>
|
| 39 |
+
</div>
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
## Prompt-R1 Implementation
|
| 44 |
+
|
| 45 |
+
### Install Environment
|
| 46 |
+
```bash
|
| 47 |
+
conda create -n promptr1 python==3.12 -y
|
| 48 |
+
conda activate promptr1
|
| 49 |
+
cd verl
|
| 50 |
+
pip3 install -e .
|
| 51 |
+
pip3 install vllm==0.8.3
|
| 52 |
+
pip3 install flash-attn==2.7.4.post1 # Download: https://github.com/Dao-AILab/flash-attention/releases
|
| 53 |
+
pip3 install FlagEmbedding faiss-cpu
|
| 54 |
+
pip3 install debugpy==1.8.0 "ray[default]" debugpy
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### Dataset Preparation
|
| 58 |
+
>Our datasets are in:
|
| 59 |
+
```bash
|
| 60 |
+
Training Dataset: dataset\train_data
|
| 61 |
+
Evaluation Dataset: dataset\eval_data
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
### Quick Start: Prompt-R1
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
### 1. To use closed source LLM, modify promptr1_agent\tool\tools\LLM-toolpy:
|
| 68 |
+
```bash
|
| 69 |
+
API_KEY = "your_api_key"
|
| 70 |
+
MODEL = "model_name"
|
| 71 |
+
BASE_URL = "url"
|
| 72 |
+
```
|
| 73 |
+
>Run:
|
| 74 |
+
```bash
|
| 75 |
+
nohup bash run_prompt-R1.sh > Prompt-R1_training.out &
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
### 2. Deploy an Open-Source Model Locally
|
| 80 |
+
#### 1. Install vLLM and dependencies
|
| 81 |
+
```bash
|
| 82 |
+
# Create environment
|
| 83 |
+
conda create -n vllmapi python=3.12 -y
|
| 84 |
+
conda activate vllmapi
|
| 85 |
+
# Install dependencies
|
| 86 |
+
pip3 install transformers accelerate huggingface_hub
|
| 87 |
+
pip3 install vllm
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
#### 2. Start the OpenAI-compatible server:
|
| 91 |
+
```bash
|
| 92 |
+
nohup bash vllm_api.sh > api.out 2>&1 &
|
| 93 |
+
```
|
| 94 |
+
#### 3. To use closed source LLM, modify promptr1_agent\tool\tools\LLM-toolpy to call your local API:
|
| 95 |
+
>Edit agent_r1/tool/tools/search_tool.py and set the local API endpoint and model name
|
| 96 |
+
```bash
|
| 97 |
+
base_url = "http://<SERVER_IP>:8006/v1"
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
### Evaluation
|
| 101 |
+
#### 1.Edit model_merge.sh and set the paths:
|
| 102 |
+
```bash
|
| 103 |
+
export CHECKPOINT_DIR='checkpoints/Prompt-R1/Prompt-R1-qwen3-4b-gpt-4o-mini/global_step_320/actor'
|
| 104 |
+
export HF_MODEL_PATH='./Qwen/Qwen3-4B'
|
| 105 |
+
export TARGET_DIR='./merge_model/Prompt-R1_Qwen3-4B'
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
#### 2.Edit vllm_serve.sh:
|
| 109 |
+
```bash
|
| 110 |
+
export MODEL_NAME='./merge_model/Prompt-R1_Qwen3-4B'
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
#### 3.Inference
|
| 114 |
+
```bash
|
| 115 |
+
python inference.py
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
#### 4.Batch inference & Evaluation
|
| 119 |
+
```bash
|
| 120 |
+
python batch_inference.py
|
| 121 |
+
python eval_scores.py
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
## BibTex
|
| 125 |
+
|
| 126 |
+
If you find this work is helpful for your research, please cite:
|
| 127 |
+
|
| 128 |
+
```bibtex
|
| 129 |
+
@misc{liu2025promptr1collaborativeautomaticprompting,
|
| 130 |
+
title={Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning},
|
| 131 |
+
author={Wenjin Liu and Haoran Luo and Xueyuan Lin and Haoming Liu and Tiesunlong Shen and Jiapu Wang and Rui Mao and Erik Cambria},
|
| 132 |
+
year={2025},
|
| 133 |
+
eprint={2511.01016},
|
| 134 |
+
archivePrefix={arXiv},
|
| 135 |
+
primaryClass={cs.CL},
|
| 136 |
+
url={https://arxiv.org/abs/2511.01016},
|
| 137 |
+
}
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
For further questions, please contact: wenjinliu23@outlook.com.
|
| 141 |
+
|
| 142 |
+
## Acknowledgement
|
| 143 |
+
|
| 144 |
+
This repo benefits from [Agent-R1](https://github.com/0russwest0/Agent-R1), [R1-Searcher](https://github.com/RUCAIBox/R1-Searcher), [Graph-R1](https://github.com/LHRLAB/Graph-R1), and [Search-R1](https://github.com/RUCAIBox/R1-Searcher). Thanks for their wonderful works.
|