langfeng01 commited on
Commit
f1cbb12
·
verified ·
1 Parent(s): b6b803b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -4,7 +4,40 @@ base_model:
4
  - Qwen/Qwen2.5-7B-Instruct
5
  ---
6
 
7
- To use this model, please refer to [verl-agent](https://github.com/langfengQ/verl-agent).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  `GiGPO-Qwen2.5-7B-Instruct-WebShop` is trained using [GiGPO](https://huggingface.co/papers/2505.10978) and the following prompt:
10
 
 
4
  - Qwen/Qwen2.5-7B-Instruct
5
  ---
6
 
7
+ <p align="center">
8
+ <img src="./logo-verl-agent.png" alt="logo" width="55%">
9
+ </p>
10
+
11
+
12
+ <p align="center">
13
+ <a href="https://arxiv.org/abs/2505.10978">
14
+ <img src="https://img.shields.io/badge/arXiv-Paper-red?style=flat-square&logo=arxiv" alt="arXiv Paper"></a>
15
+ &nbsp;
16
+ <a href="https://github.com/langfengQ/verl-agent">
17
+ <img src="https://img.shields.io/badge/GitHub-Project-181717?style=flat-square&logo=github" alt="GitHub Project"></a>
18
+ &nbsp;
19
+ <a href="https://huggingface.co/collections/langfeng01/verl-agent-684970e8f51babe2a6d98554">
20
+ <img src="https://img.shields.io/badge/HuggingFace-Models-yellow?style=flat-square&logo=huggingface" alt="HuggingFace Models"></a>
21
+ &nbsp;
22
+ <a href="https://x.com/langfengq/status/1930848580505620677">
23
+ <img src="https://img.shields.io/badge/Twitter-Channel-000000?style=flat-square&logo=x" alt="X Channel"></a>
24
+ </p>
25
+
26
+
27
+ ## Quick Start
28
+
29
+ To use this model, follow these three steps:
30
+
31
+ 1. Clone [verl-agent](https://github.com/langfengQ/verl-agent).
32
+ 2. Set [`actor_rollout_ref.model.path`](https://github.com/langfengQ/verl-agent/blob/35b3da38293993f9bf4f7873dfb3262a361e956c/examples/gigpo_trainer/run_webshop.sh#L30) to your local path, e.g. `your/own/path/GiGPO-Qwen2.5-7B-Instruct-ALFWorld`.
33
+ 3. Ensure [`trainer.val_before_train=True`](https://github.com/langfengQ/verl-agent/blob/35b3da38293993f9bf4f7873dfb3262a361e956c/examples/gigpo_trainer/run_webshop.sh#L72), so evaluation runs before training.
34
+
35
+
36
+ For more details, please refer to the [verl-agent](https://github.com/langfengQ/verl-agent).
37
+
38
+ ---
39
+
40
+ ## Notes
41
 
42
  `GiGPO-Qwen2.5-7B-Instruct-WebShop` is trained using [GiGPO](https://huggingface.co/papers/2505.10978) and the following prompt:
43