Prompt-R1 / README.md

Update README.md

bd75068 verified 27 days ago

5.14 kB

	---
	# YAML 元数据块 (Model Card Header)
	language:
	- en
	license: apache-2.0
	model_name: Prompt-R1 Model
	tags:
	- text-generation
	- reinforcement-learning
	- nlp
	- transformers
	- safetensors
	# 关联的数据集
	datasets:
	- QwenQKing/Prompt-R1
	# 决定网页右侧的推理小组件类型
	pipeline_tag: text-generation
	---
	# Prompt-R1: Enhancing LLM interaction on behalf of humans

	<div align="center">

	[![arXiv](https://img.shields.io/badge/arXiv-2511.01016-b31b1b.svg)](https://arxiv.org/abs/2511.01016)
	[![Homepage](https://img.shields.io/badge/Homepage-Prompt--R1-blue.svg)](https://qwenqking.github.io/Prompt-R1/)
	[![GitHub](https://img.shields.io/badge/GitHub-Prompt--R1-181717.svg?logo=github)](https://github.com/QwenQKing/Prompt-R1)
	[![Dataset](https://img.shields.io/badge/Dataset-HuggingFace-orange.svg?logo=huggingface)](https://huggingface.co/datasets/QwenQKing/Prompt-R1)


	### Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

	[📄 Paper](https://arxiv.org/abs/2511.01016) \| [🚀 Quick Start](#quick-start-prompt-r1) \| [💬 Contact](mailto:wenjinliu23@outlook.com)

	</div>

	---

	## Overview

	<div align="center">
	<img src="image/2-QA.png" width="80%"/>
	</div>

	Prompt-R1 has addressed a critical challenge in interacting with large language models (LLMs)—the inability of users to provide accurate and effective interaction prompts for complex tasks. Prompt-R1 is an end-to-end reinforcement learning (RL) framework that enhances the performance of LLMs by facilitating collaborative automatic prompting between a small-scale LLM and a large-scale LLM. Prompt-R1, through multi-turn prompt interaction, significantly improves the generation quality and reasoning accuracy of large-scale LLMs, enabling better task-solving performance without requiring user expertise in prompt formulation.



	<div align="center">
	<img src="image/1-overview.png" width="90%"/>
	</div>

	By integrating collaborative prompting and reinforcement learning, Prompt-R1 offers a plug-and-play framework that supports both inference and training with various large-scale LLMs as the environment.

	## Experimental Results
	Results of Different Large language models:
	<div align="center">
	<img src="image/6-radar.png" width="100%"/>
	</div>



	## Prompt-R1 Implementation

	### Install Environment
	```bash
	conda create -n promptr1 python==3.12 -y
	conda activate promptr1
	cd verl
	pip3 install -e .
	pip3 install vllm==0.8.3
	pip3 install flash-attn==2.7.4.post1 # Download: https://github.com/Dao-AILab/flash-attention/releases
	pip3 install FlagEmbedding faiss-cpu
	pip3 install debugpy==1.8.0 "ray[default]" debugpy
	```

	### Dataset Preparation
	>Our datasets are in:
	```bash
	Training Dataset: dataset\train_data
	Evaluation Dataset: dataset\eval_data
	```

	### Quick Start: Prompt-R1


	### 1. To use closed source LLM, modify promptr1_agent\tool\tools\LLM-toolpy:
	```bash
	API_KEY = "your_api_key"
	MODEL = "model_name"
	BASE_URL = "url"
	```
	>Run:
	```bash
	nohup bash run_prompt-R1.sh > Prompt-R1_training.out &
	```


	### 2. Deploy an Open-Source Model Locally
	#### 1. Install vLLM and dependencies
	```bash
	# Create environment
	conda create -n vllmapi python=3.12 -y
	conda activate vllmapi
	# Install dependencies
	pip3 install transformers accelerate huggingface_hub
	pip3 install vllm
	```

	#### 2. Start the OpenAI-compatible server:
	```bash
	nohup bash vllm_api.sh > api.out 2>&1 &
	```
	#### 3. To use closed source LLM, modify promptr1_agent\tool\tools\LLM-toolpy to call your local API:
	>Edit agent_r1/tool/tools/search_tool.py and set the local API endpoint and model name
	```bash
	base_url = "http://<SERVER_IP>:8006/v1"
	```

	### Evaluation
	#### 1.Edit model_merge.sh and set the paths:
	```bash
	export CHECKPOINT_DIR='checkpoints/Prompt-R1/Prompt-R1-qwen3-4b-gpt-4o-mini/global_step_320/actor'
	export HF_MODEL_PATH='./Qwen/Qwen3-4B'
	export TARGET_DIR='./merge_model/Prompt-R1_Qwen3-4B'
	```

	#### 2.Edit vllm_serve.sh:
	```bash
	export MODEL_NAME='./merge_model/Prompt-R1_Qwen3-4B'
	```

	#### 3.Inference
	```bash
	python inference.py
	```

	#### 4.Batch inference & Evaluation
	```bash
	python batch_inference.py
	python eval_scores.py
	```

	## BibTex

	If you find this work is helpful for your research, please cite:

	```bibtex
	@misc{liu2025promptr1collaborativeautomaticprompting,
	title={Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning},
	author={Wenjin Liu and Haoran Luo and Xueyuan Lin and Haoming Liu and Tiesunlong Shen and Jiapu Wang and Rui Mao and Erik Cambria},
	year={2025},
	eprint={2511.01016},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2511.01016},
	}
	```

	For further questions, please contact: wenjinliu23@outlook.com.

	## Acknowledgement

	This repo benefits from [Agent-R1](https://github.com/0russwest0/Agent-R1), [R1-Searcher](https://github.com/RUCAIBox/R1-Searcher), [Graph-R1](https://github.com/LHRLAB/Graph-R1), and [Search-R1](https://github.com/RUCAIBox/R1-Searcher). Thanks for their wonderful works.