nielsr HF Staff commited on
Commit
ff526ce
·
verified ·
1 Parent(s): 1514d93

Add metadata and improve model card

Browse files

Hi! I'm Niels from the Hugging Face community science team.

I noticed this model card could use some additional metadata and documentation to help users discover and use it. This PR:
- Adds the `pipeline_tag: image-text-to-text` and `library_name: transformers` metadata.
- Updates the title and description to align with the official paper: **RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System**.
- Ensures links to the paper, code repository, and project blog are correctly maintained.

Please let me know if you have any questions!

Files changed (1) hide show
  1. README.md +7 -11
README.md CHANGED
@@ -1,20 +1,19 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
5
-
6
- # Introduction to TraDo
7
 
8
  [Paper](https://arxiv.org/abs/2602.02488) | [Code](https://github.com/Gen-Verse/Open-AgentRL) | [Blog](https://yinjjiew.github.io/projects/rlanything/)
9
 
10
- We introduce **RLAnything**, a reinforcement learning framework forges environment, policy and reward model in a completely dynamic system to enhance the training signals and improve the whole system.
11
 
12
  * **Integrated Feedback for Policy:** The policy is trained with integrated outcome and step-wise signals from reward model.
13
  * **Consistency Feedback for Reward Model:** The Reward model is jointly optimized by consistency feedback, further improves policy training.
14
  * **Critic Feedback for Environment:** Our theory-motivated automatic environment adaptation improves training for both the reward and policy models by leveraging critic feedback from each.
15
 
16
-
17
-
18
  <p align="center">
19
  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingoverview.png" width="100%"/>
20
  </p>
@@ -30,16 +29,13 @@ We introduce **RLAnything**, a reinforcement learning framework forges environme
30
  </p>
31
 
32
 
33
-
34
  # Citation
35
 
36
- ```
37
  @article{wang2026rlanything,
38
  title={RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System},
39
- author={Wang, Yinjie and Xie, Tianbao and Shen, Ke and Wang, Mengdi and Yang, Ling},
40
  journal={arXiv preprint arXiv:2602.02488},
41
  year={2026}
42
  }
43
- ```
44
-
45
-
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ pipeline_tag: image-text-to-text
5
  ---
6
 
7
+ # RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
 
8
 
9
  [Paper](https://arxiv.org/abs/2602.02488) | [Code](https://github.com/Gen-Verse/Open-AgentRL) | [Blog](https://yinjjiew.github.io/projects/rlanything/)
10
 
11
+ We introduce **RLAnything**, a reinforcement learning framework that forges environment, policy, and reward models in a completely dynamic system to enhance the training signals and improve the whole system.
12
 
13
  * **Integrated Feedback for Policy:** The policy is trained with integrated outcome and step-wise signals from reward model.
14
  * **Consistency Feedback for Reward Model:** The Reward model is jointly optimized by consistency feedback, further improves policy training.
15
  * **Critic Feedback for Environment:** Our theory-motivated automatic environment adaptation improves training for both the reward and policy models by leveraging critic feedback from each.
16
 
 
 
17
  <p align="center">
18
  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingoverview.png" width="100%"/>
19
  </p>
 
29
  </p>
30
 
31
 
 
32
  # Citation
33
 
34
+ ```bibtex
35
  @article{wang2026rlanything,
36
  title={RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System},
37
+ author={Wang, Yinjie and Xie, Tianbao Clerk and Shen, Ke and Wang, Mengdi and Yang, Ling},
38
  journal={arXiv preprint arXiv:2602.02488},
39
  year={2026}
40
  }
41
+ ```