ben-dlwlrma
/

Representation-Over-Routing

@@ -7,6 +7,7 @@ tags:
 - deep-reinforcement-learning
 - gym
 - lunar-lander
 license: cc-by-4.0
 library_name: pytorch
 ---
@@ -15,11 +16,18 @@ library_name: pytorch
 [![arXiv](https://img.shields.io/badge/arXiv-2604.13517-b31b1b.svg)](https://arxiv.org/abs/2604.13517)
 [![GitHub](https://img.shields.io/badge/GitHub-Codebase-blue?logo=github)](https://github.com/ben-dlwlrma/Representation-Over-Routing)
 This repository hosts the **pre-trained PyTorch model weights** for the 4-stage ablation study presented in the paper: *"Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO"*.
 Our work identifies severe optimization pathologies in multi-timescale RL (**Surrogate Objective Hacking** and **the Paradox of Temporal Uncertainty**) and introduces **Target Decoupling** to align agents with true long-term objectives without collapsing into short-term behavioral traps.
 ## Model Weights Overview
 We provide four standalone `.pth` weight files, corresponding to the isolated stages of our ablation study on the `LunarLander-v2` environment:
@@ -81,6 +89,8 @@ while not done:
     done = terminated or truncated
 ```
 ## Citation
 If you find this code or our insights useful in your research, please consider citing our work:
@@ -97,4 +107,4 @@ If you find this code or our insights useful in your research, please consider c
   copyright = {Creative Commons Attribution 4.0 International},
   keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
 }
-```

 - deep-reinforcement-learning
 - gym
 - lunar-lander
+- arxiv:2604.13517
 license: cc-by-4.0
 library_name: pytorch
 ---
 [![arXiv](https://img.shields.io/badge/arXiv-2604.13517-b31b1b.svg)](https://arxiv.org/abs/2604.13517)
 [![GitHub](https://img.shields.io/badge/GitHub-Codebase-blue?logo=github)](https://github.com/ben-dlwlrma/Representation-Over-Routing)
+[![Demo](https://img.shields.io/badge/Hugging%20Face-Space-yellow?logo=huggingface)](https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo)
 This repository hosts the **pre-trained PyTorch model weights** for the 4-stage ablation study presented in the paper: *"Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO"*.
 Our work identifies severe optimization pathologies in multi-timescale RL (**Surrogate Objective Hacking** and **the Paradox of Temporal Uncertainty**) and introduces **Target Decoupling** to align agents with true long-term objectives without collapsing into short-term behavioral traps.
+## Related Links
+* **Paper:** https://arxiv.org/abs/2604.13517
+* **Interactive Demo Space:** https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo
+* **Official GitHub Repository:** https://github.com/ben-dlwlrma/Representation-Over-Routing
 ## Model Weights Overview
 We provide four standalone `.pth` weight files, corresponding to the isolated stages of our ablation study on the `LunarLander-v2` environment:
     done = terminated or truncated
 ```
+The paper experiments were conducted on `LunarLander-v2`. The hosted Space may use `LunarLander-v3` for compatibility with current Gymnasium releases, while keeping the same actor architecture and pretrained weights.
 ## Citation
 If you find this code or our insights useful in your research, please consider citing our work:
   copyright = {Creative Commons Attribution 4.0 International},
   keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
 }
+```