Update pipeline tag, add library name, and usage information

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +37 -4
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - Qwen/Qwen3-4B-Instruct-2507
7
- pipeline_tag: question-answering
 
 
 
 
8
  tags:
9
  - agent
10
  - reinforcement-learning
@@ -20,6 +21,7 @@ tags:
20
  [[Paper](https://arxiv.org/abs/2602.05327)] [[Code](https://github.com/GreatX3/ProAct)]
21
  [[Project Page](https://github.com/GreatX3/ProAct)]
22
  </div>
 
23
  ## 📖 Introduction
24
 
25
  This repository contains the official model weights for the paper **"ProAct: Agentic Lookahead in Interactive Environments"**.
@@ -41,3 +43,34 @@ This repository contains model weights for different tasks (2048, Sokoban) and t
41
  | **`2048_rl`** | 2048 | RL (Stage 2) | Model further fine-tuned using RL with **MC-Critic**, initialized from the SFT checkpoint. |
42
  | **`sokoban_sft`** | Sokoban | SFT (Stage 1) | GLAD SFT model for the Sokoban task. |
43
  | **`sokoban_rl`** | Sokoban | RL (Stage 2) | MC-Critic RL model for the Sokoban task. |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen3-4B-Instruct-2507
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
  tags:
10
  - agent
11
  - reinforcement-learning
 
21
  [[Paper](https://arxiv.org/abs/2602.05327)] [[Code](https://github.com/GreatX3/ProAct)]
22
  [[Project Page](https://github.com/GreatX3/ProAct)]
23
  </div>
24
+
25
  ## 📖 Introduction
26
 
27
  This repository contains the official model weights for the paper **"ProAct: Agentic Lookahead in Interactive Environments"**.
 
43
  | **`2048_rl`** | 2048 | RL (Stage 2) | Model further fine-tuned using RL with **MC-Critic**, initialized from the SFT checkpoint. |
44
  | **`sokoban_sft`** | Sokoban | SFT (Stage 1) | GLAD SFT model for the Sokoban task. |
45
  | **`sokoban_rl`** | Sokoban | RL (Stage 2) | MC-Critic RL model for the Sokoban task. |
46
+
47
+ ## 🚀 Sample Usage
48
+
49
+ You can deploy the model weights using [vLLM](https://github.com/vllm-project/vllm). For example, to serve the `2048_rl` checkpoint:
50
+
51
+ ```bash
52
+ # Start the vLLM server
53
+ vllm serve biang889/ProAct --subfolder 2048_rl \
54
+ --served-model-name ProAct \
55
+ --host 0.0.0.0 \
56
+ --port 8080 \
57
+ --tensor-parallel-size 1
58
+ ```
59
+
60
+ Once served, you can interact with the model via an OpenAI-compatible API.
61
+
62
+ ## 📜 Citation
63
+
64
+ If you find this project useful in your research, please cite our paper:
65
+
66
+ ```bibtex
67
+ @misc{yu2026proactagenticlookaheadinteractive,
68
+ title={ProAct: Agentic Lookahead in Interactive Environments},
69
+ author={Yangbin Yu and Mingyu Yang and Junyou Li and Yiming Gao and Feiyu Liu and Yijun Yang and Zichuan Lin and Jiafei Lyu and Yicheng Liu and Zhicong Lu and Deheng Ye and Jie Jiang},
70
+ year={2026},
71
+ eprint={2602.05327},
72
+ archivePrefix={arXiv},
73
+ primaryClass={cs.AI},
74
+ url={https://arxiv.org/abs/2602.05327},
75
+ }
76
+ ```