yinjiewang commited on
Commit
dd4f04a
·
verified ·
1 Parent(s): 2c8d933

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -3
README.md CHANGED
@@ -1,3 +1,36 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+
6
+ # Introduction to TraDo
7
+
8
+ [Paper](https://arxiv.org/abs/2509.06949) | [Code](https://github.com/Gen-Verse/Open-AgentRL) | [Blog](https://yinjjiew.github.io/projects/rlanything/)
9
+
10
+ We introduce **RLAnything**, a reinforcement learning framework forges environment, policy and reward model in a completely dynamic system to enhance the training signals and improve the whole system.
11
+
12
+ * **Integrated Feedback for Policy:** The policy is trained with integrated outcome and step-wise signals from reward model.
13
+ * **Consistency Feedback for Reward Model:** The Reward model is jointly optimized by consistency feedback, further improves policy training.
14
+ * **Critic Feedback for Environment:** Our theory-motivated automatic environment adaptation improves training for both the reward and policy models by leveraging critic feedback from each.
15
+
16
+
17
+
18
+ <p align="center">
19
+ <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingoverview.png" width="100%"/>
20
+ </p>
21
+
22
+
23
+ <p align="center">
24
+ <img src="https://github.com/yinjjiew/Data/raw/main/rlanything/rlanythingmaintable.png" width="100%"/>
25
+ </p>
26
+
27
+
28
+
29
+
30
+ # Citation
31
+
32
+ ```
33
+
34
+ ```
35
+
36
+