Update README.md
Browse files
README.md
CHANGED
|
@@ -7,6 +7,7 @@ tags:
|
|
| 7 |
- deep-reinforcement-learning
|
| 8 |
- gym
|
| 9 |
- lunar-lander
|
|
|
|
| 10 |
license: cc-by-4.0
|
| 11 |
library_name: pytorch
|
| 12 |
---
|
|
@@ -15,11 +16,18 @@ library_name: pytorch
|
|
| 15 |
|
| 16 |
[](https://arxiv.org/abs/2604.13517)
|
| 17 |
[](https://github.com/ben-dlwlrma/Representation-Over-Routing)
|
|
|
|
| 18 |
|
| 19 |
This repository hosts the **pre-trained PyTorch model weights** for the 4-stage ablation study presented in the paper: *"Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO"*.
|
| 20 |
|
| 21 |
Our work identifies severe optimization pathologies in multi-timescale RL (**Surrogate Objective Hacking** and **the Paradox of Temporal Uncertainty**) and introduces **Target Decoupling** to align agents with true long-term objectives without collapsing into short-term behavioral traps.
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
## Model Weights Overview
|
| 24 |
|
| 25 |
We provide four standalone `.pth` weight files, corresponding to the isolated stages of our ablation study on the `LunarLander-v2` environment:
|
|
@@ -81,6 +89,8 @@ while not done:
|
|
| 81 |
done = terminated or truncated
|
| 82 |
```
|
| 83 |
|
|
|
|
|
|
|
| 84 |
## Citation
|
| 85 |
|
| 86 |
If you find this code or our insights useful in your research, please consider citing our work:
|
|
@@ -97,4 +107,4 @@ If you find this code or our insights useful in your research, please consider c
|
|
| 97 |
copyright = {Creative Commons Attribution 4.0 International},
|
| 98 |
keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
|
| 99 |
}
|
| 100 |
-
```
|
|
|
|
| 7 |
- deep-reinforcement-learning
|
| 8 |
- gym
|
| 9 |
- lunar-lander
|
| 10 |
+
- arxiv:2604.13517
|
| 11 |
license: cc-by-4.0
|
| 12 |
library_name: pytorch
|
| 13 |
---
|
|
|
|
| 16 |
|
| 17 |
[](https://arxiv.org/abs/2604.13517)
|
| 18 |
[](https://github.com/ben-dlwlrma/Representation-Over-Routing)
|
| 19 |
+
[](https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo)
|
| 20 |
|
| 21 |
This repository hosts the **pre-trained PyTorch model weights** for the 4-stage ablation study presented in the paper: *"Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO"*.
|
| 22 |
|
| 23 |
Our work identifies severe optimization pathologies in multi-timescale RL (**Surrogate Objective Hacking** and **the Paradox of Temporal Uncertainty**) and introduces **Target Decoupling** to align agents with true long-term objectives without collapsing into short-term behavioral traps.
|
| 24 |
|
| 25 |
+
## Related Links
|
| 26 |
+
|
| 27 |
+
* **Paper:** https://arxiv.org/abs/2604.13517
|
| 28 |
+
* **Interactive Demo Space:** https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo
|
| 29 |
+
* **Official GitHub Repository:** https://github.com/ben-dlwlrma/Representation-Over-Routing
|
| 30 |
+
|
| 31 |
## Model Weights Overview
|
| 32 |
|
| 33 |
We provide four standalone `.pth` weight files, corresponding to the isolated stages of our ablation study on the `LunarLander-v2` environment:
|
|
|
|
| 89 |
done = terminated or truncated
|
| 90 |
```
|
| 91 |
|
| 92 |
+
The paper experiments were conducted on `LunarLander-v2`. The hosted Space may use `LunarLander-v3` for compatibility with current Gymnasium releases, while keeping the same actor architecture and pretrained weights.
|
| 93 |
+
|
| 94 |
## Citation
|
| 95 |
|
| 96 |
If you find this code or our insights useful in your research, please consider citing our work:
|
|
|
|
| 107 |
copyright = {Creative Commons Attribution 4.0 International},
|
| 108 |
keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
|
| 109 |
}
|
| 110 |
+
```
|