Update pipeline tag, add library name, and usage information
Browse filesHi! I'm Niels from the Hugging Face community science team. I noticed that this model card could benefit from a few updates to improve its visibility and usability on the Hub.
I've made the following changes:
- Updated the `pipeline_tag` from `question-answering` to `text-generation` to better reflect the model's architecture and use case as an agent.
- Added `library_name: transformers` to the metadata, as the repository contains the necessary configuration files for compatibility.
- Added a "Sample Usage" section based on the vLLM deployment instructions found in your official GitHub repository.
- Added the BibTeX citation for your paper.
These changes will help users discover and interact with your work more effectively. Feel free to merge if everything looks correct!
|
@@ -1,10 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen3-4B-Instruct-2507
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
tags:
|
| 9 |
- agent
|
| 10 |
- reinforcement-learning
|
|
@@ -20,6 +21,7 @@ tags:
|
|
| 20 |
[[Paper](https://arxiv.org/abs/2602.05327)] [[Code](https://github.com/GreatX3/ProAct)]
|
| 21 |
[[Project Page](https://github.com/GreatX3/ProAct)]
|
| 22 |
</div>
|
|
|
|
| 23 |
## 馃摉 Introduction
|
| 24 |
|
| 25 |
This repository contains the official model weights for the paper **"ProAct: Agentic Lookahead in Interactive Environments"**.
|
|
@@ -41,3 +43,34 @@ This repository contains model weights for different tasks (2048, Sokoban) and t
|
|
| 41 |
| **`2048_rl`** | 2048 | RL (Stage 2) | Model further fine-tuned using RL with **MC-Critic**, initialized from the SFT checkpoint. |
|
| 42 |
| **`sokoban_sft`** | Sokoban | SFT (Stage 1) | GLAD SFT model for the Sokoban task. |
|
| 43 |
| **`sokoban_rl`** | Sokoban | RL (Stage 2) | MC-Critic RL model for the Sokoban task. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen3-4B-Instruct-2507
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
library_name: transformers
|
| 9 |
tags:
|
| 10 |
- agent
|
| 11 |
- reinforcement-learning
|
|
|
|
| 21 |
[[Paper](https://arxiv.org/abs/2602.05327)] [[Code](https://github.com/GreatX3/ProAct)]
|
| 22 |
[[Project Page](https://github.com/GreatX3/ProAct)]
|
| 23 |
</div>
|
| 24 |
+
|
| 25 |
## 馃摉 Introduction
|
| 26 |
|
| 27 |
This repository contains the official model weights for the paper **"ProAct: Agentic Lookahead in Interactive Environments"**.
|
|
|
|
| 43 |
| **`2048_rl`** | 2048 | RL (Stage 2) | Model further fine-tuned using RL with **MC-Critic**, initialized from the SFT checkpoint. |
|
| 44 |
| **`sokoban_sft`** | Sokoban | SFT (Stage 1) | GLAD SFT model for the Sokoban task. |
|
| 45 |
| **`sokoban_rl`** | Sokoban | RL (Stage 2) | MC-Critic RL model for the Sokoban task. |
|
| 46 |
+
|
| 47 |
+
## 馃殌 Sample Usage
|
| 48 |
+
|
| 49 |
+
You can deploy the model weights using [vLLM](https://github.com/vllm-project/vllm). For example, to serve the `2048_rl` checkpoint:
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
# Start the vLLM server
|
| 53 |
+
vllm serve biang889/ProAct --subfolder 2048_rl \
|
| 54 |
+
--served-model-name ProAct \
|
| 55 |
+
--host 0.0.0.0 \
|
| 56 |
+
--port 8080 \
|
| 57 |
+
--tensor-parallel-size 1
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
Once served, you can interact with the model via an OpenAI-compatible API.
|
| 61 |
+
|
| 62 |
+
## 馃摐 Citation
|
| 63 |
+
|
| 64 |
+
If you find this project useful in your research, please cite our paper:
|
| 65 |
+
|
| 66 |
+
```bibtex
|
| 67 |
+
@misc{yu2026proactagenticlookaheadinteractive,
|
| 68 |
+
title={ProAct: Agentic Lookahead in Interactive Environments},
|
| 69 |
+
author={Yangbin Yu and Mingyu Yang and Junyou Li and Yiming Gao and Feiyu Liu and Yijun Yang and Zichuan Lin and Jiafei Lyu and Yicheng Liu and Zhicong Lu and Deheng Ye and Jie Jiang},
|
| 70 |
+
year={2026},
|
| 71 |
+
eprint={2602.05327},
|
| 72 |
+
archivePrefix={arXiv},
|
| 73 |
+
primaryClass={cs.AI},
|
| 74 |
+
url={https://arxiv.org/abs/2602.05327},
|
| 75 |
+
}
|
| 76 |
+
```
|