nielsr HF Staff commited on
Commit
8cfd0e3
·
verified ·
1 Parent(s): d2851df

Improve model card and add metadata

Browse files

This PR improves the model card for RLAnything.
- Updates the title to reflect the actual paper name.
- Adds metadata for `library_name: transformers` and `pipeline_tag: text-generation`.
- Links the Hugging Face paper page.
- Provides a summary of the framework's key features.
- Formats the citation as bibtex.

Files changed (1) hide show
  1. README.md +13 -14
README.md CHANGED
@@ -1,41 +1,40 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
 
5
 
6
- # Introduction to TraDo
7
 
8
- [Paper](https://arxiv.org/abs/2602.02488) | [Code](https://github.com/Gen-Verse/Open-AgentRL) | [Blog](https://yinjjiew.github.io/projects/rlanything/)
9
 
10
- We introduce **RLAnything**, a reinforcement learning framework forges environment, policy and reward model in a completely dynamic system to enhance the training signals and improve the whole system.
11
 
 
12
  * **Integrated Feedback for Policy:** The policy is trained with integrated outcome and step-wise signals from reward model.
13
  * **Consistency Feedback for Reward Model:** The Reward model is jointly optimized by consistency feedback, further improves policy training.
14
- * **Critic Feedback for Environment:** Our theory-motivated automatic environment adaptation improves training for both the reward and policy models by leveraging critic feedback from each.
15
-
16
-
17
 
18
  <p align="center">
19
  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingoverview.png" width="100%"/>
20
  </p>
21
 
 
 
22
 
23
  <p align="center">
24
  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingmaintable.png" width="100%"/>
25
  </p>
26
 
 
27
 
28
-
29
-
30
- # Citation
31
-
32
- ```
33
  @article{wang2026rlanything,
34
  title={RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System},
35
  author={Wang, Yinjie and Xie, Tianbao and Shen, Ke and Wang, Mengdi and Yang, Ling},
36
  journal={arXiv preprint arXiv:2602.02488},
37
  year={2026}
38
  }
39
- ```
40
-
41
-
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  ---
6
 
7
+ # RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
8
 
9
+ [Paper](https://huggingface.co/papers/2602.02488) | [Code](https://github.com/Gen-Verse/Open-AgentRL) | [Blog](https://yinjjiew.github.io/projects/rlanything/) | [Project Page](https://huggingface.co/collections/Gen-Verse/open-agentrl)
10
 
11
+ **RLAnything** is a reinforcement learning framework that dynamically forges environment, policy, and reward models through closed-loop optimization, amplifying learning signals and strengthening the overall RL system for any LLM or agentic scenarios.
12
 
13
+ Specifically, the policy is trained with integrated feedback from step-wise and outcome signals, while the reward model is jointly optimized via consistency feedback, which in turn further improves policy training. Moreover, theory-motivated automatic environment adaptation improves training for both the reward and policy models by leveraging critic feedback from each, enabling learning from experience.
14
 
15
+ ## Key Features
16
  * **Integrated Feedback for Policy:** The policy is trained with integrated outcome and step-wise signals from reward model.
17
  * **Consistency Feedback for Reward Model:** The Reward model is jointly optimized by consistency feedback, further improves policy training.
18
+ * **Critic Feedback for Environment:** Theory-motivated automatic environment adaptation improves training for both the reward and policy models by leveraging critic feedback from each.
 
 
19
 
20
  <p align="center">
21
  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingoverview.png" width="100%"/>
22
  </p>
23
 
24
+ ## Results
25
+ Empirically, each added component consistently improves the overall system, and RLAnything yields substantial gains across various representative LLM and agentic tasks, boosting Qwen3-VL-8B-Thinking by 9.1% on OSWorld and Qwen2.5-7B-Instruct by 18.7% and 11.9% on AlfWorld and LiveBench, respectively.
26
 
27
  <p align="center">
28
  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingmaintable.png" width="100%"/>
29
  </p>
30
 
31
+ ## Citation
32
 
33
+ ```bibtex
 
 
 
 
34
  @article{wang2026rlanything,
35
  title={RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System},
36
  author={Wang, Yinjie and Xie, Tianbao and Shen, Ke and Wang, Mengdi and Yang, Ling},
37
  journal={arXiv preprint arXiv:2602.02488},
38
  year={2026}
39
  }
40
+ ```