Update pipeline tag, add library name, and usage information
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,10 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen3-4B-Instruct-2507
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
tags:
|
| 9 |
- agent
|
| 10 |
- reinforcement-learning
|
|
@@ -20,6 +21,7 @@ tags:
|
|
| 20 |
[[Paper](https://arxiv.org/abs/2602.05327)] [[Code](https://github.com/GreatX3/ProAct)]
|
| 21 |
[[Project Page](https://github.com/GreatX3/ProAct)]
|
| 22 |
</div>
|
|
|
|
| 23 |
## 📖 Introduction
|
| 24 |
|
| 25 |
This repository contains the official model weights for the paper **"ProAct: Agentic Lookahead in Interactive Environments"**.
|
|
@@ -41,3 +43,34 @@ This repository contains model weights for different tasks (2048, Sokoban) and t
|
|
| 41 |
| **`2048_rl`** | 2048 | RL (Stage 2) | Model further fine-tuned using RL with **MC-Critic**, initialized from the SFT checkpoint. |
|
| 42 |
| **`sokoban_sft`** | Sokoban | SFT (Stage 1) | GLAD SFT model for the Sokoban task. |
|
| 43 |
| **`sokoban_rl`** | Sokoban | RL (Stage 2) | MC-Critic RL model for the Sokoban task. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen3-4B-Instruct-2507
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
library_name: transformers
|
| 9 |
tags:
|
| 10 |
- agent
|
| 11 |
- reinforcement-learning
|
|
|
|
| 21 |
[[Paper](https://arxiv.org/abs/2602.05327)] [[Code](https://github.com/GreatX3/ProAct)]
|
| 22 |
[[Project Page](https://github.com/GreatX3/ProAct)]
|
| 23 |
</div>
|
| 24 |
+
|
| 25 |
## 📖 Introduction
|
| 26 |
|
| 27 |
This repository contains the official model weights for the paper **"ProAct: Agentic Lookahead in Interactive Environments"**.
|
|
|
|
| 43 |
| **`2048_rl`** | 2048 | RL (Stage 2) | Model further fine-tuned using RL with **MC-Critic**, initialized from the SFT checkpoint. |
|
| 44 |
| **`sokoban_sft`** | Sokoban | SFT (Stage 1) | GLAD SFT model for the Sokoban task. |
|
| 45 |
| **`sokoban_rl`** | Sokoban | RL (Stage 2) | MC-Critic RL model for the Sokoban task. |
|
| 46 |
+
|
| 47 |
+
## 🚀 Sample Usage
|
| 48 |
+
|
| 49 |
+
You can deploy the model weights using [vLLM](https://github.com/vllm-project/vllm). For example, to serve the `2048_rl` checkpoint:
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
# Start the vLLM server
|
| 53 |
+
vllm serve biang889/ProAct --subfolder 2048_rl \
|
| 54 |
+
--served-model-name ProAct \
|
| 55 |
+
--host 0.0.0.0 \
|
| 56 |
+
--port 8080 \
|
| 57 |
+
--tensor-parallel-size 1
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
Once served, you can interact with the model via an OpenAI-compatible API.
|
| 61 |
+
|
| 62 |
+
## 📜 Citation
|
| 63 |
+
|
| 64 |
+
If you find this project useful in your research, please cite our paper:
|
| 65 |
+
|
| 66 |
+
```bibtex
|
| 67 |
+
@misc{yu2026proactagenticlookaheadinteractive,
|
| 68 |
+
title={ProAct: Agentic Lookahead in Interactive Environments},
|
| 69 |
+
author={Yangbin Yu and Mingyu Yang and Junyou Li and Yiming Gao and Feiyu Liu and Yijun Yang and Zichuan Lin and Jiafei Lyu and Yicheng Liu and Zhicong Lu and Deheng Ye and Jie Jiang},
|
| 70 |
+
year={2026},
|
| 71 |
+
eprint={2602.05327},
|
| 72 |
+
archivePrefix={arXiv},
|
| 73 |
+
primaryClass={cs.AI},
|
| 74 |
+
url={https://arxiv.org/abs/2602.05327},
|
| 75 |
+
}
|
| 76 |
+
```
|