Gen-Verse
/

RLAnything-Coder-7B

Model card Files Files and versions

yinjiewang commited on 5 days ago

Commit

dd4f04a

·

verified ·

1 Parent(s): 2c8d933

Update README.md

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
----
-license: mit
----

+---
+license: mit
+---
+# Introduction to TraDo
+[Paper](https://arxiv.org/abs/2509.06949) | [Code](https://github.com/Gen-Verse/Open-AgentRL) | [Blog](https://yinjjiew.github.io/projects/rlanything/)
+We introduce **RLAnything**, a reinforcement learning framework forges environment, policy and reward model in a completely dynamic system to enhance the training signals and improve the whole system.
+* **Integrated Feedback for Policy:** The policy is trained with integrated outcome and step-wise signals from reward model.
+* **Consistency Feedback for Reward Model:** The Reward model is jointly optimized by consistency feedback, further improves policy training.
+* **Critic Feedback for Environment:** Our theory-motivated automatic environment adaptation improves training for both the reward and policy models by leveraging critic feedback from each.
+<p align="center">
+  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingoverview.png" width="100%"/>
+</p>
+<p align="center">
+  <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingmaintable.png" width="100%"/>
+</p>
+# Citation
+```
+```