ben-dlwlrma commited on
Commit
967bf28
·
verified ·
1 Parent(s): 3ae4f81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -7,6 +7,7 @@ tags:
7
  - deep-reinforcement-learning
8
  - gym
9
  - lunar-lander
 
10
  license: cc-by-4.0
11
  library_name: pytorch
12
  ---
@@ -15,11 +16,18 @@ library_name: pytorch
15
 
16
  [![arXiv](https://img.shields.io/badge/arXiv-2604.13517-b31b1b.svg)](https://arxiv.org/abs/2604.13517)
17
  [![GitHub](https://img.shields.io/badge/GitHub-Codebase-blue?logo=github)](https://github.com/ben-dlwlrma/Representation-Over-Routing)
 
18
 
19
  This repository hosts the **pre-trained PyTorch model weights** for the 4-stage ablation study presented in the paper: *"Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO"*.
20
 
21
  Our work identifies severe optimization pathologies in multi-timescale RL (**Surrogate Objective Hacking** and **the Paradox of Temporal Uncertainty**) and introduces **Target Decoupling** to align agents with true long-term objectives without collapsing into short-term behavioral traps.
22
 
 
 
 
 
 
 
23
  ## Model Weights Overview
24
 
25
  We provide four standalone `.pth` weight files, corresponding to the isolated stages of our ablation study on the `LunarLander-v2` environment:
@@ -81,6 +89,8 @@ while not done:
81
  done = terminated or truncated
82
  ```
83
 
 
 
84
  ## Citation
85
 
86
  If you find this code or our insights useful in your research, please consider citing our work:
@@ -97,4 +107,4 @@ If you find this code or our insights useful in your research, please consider c
97
  copyright = {Creative Commons Attribution 4.0 International},
98
  keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
99
  }
100
- ```
 
7
  - deep-reinforcement-learning
8
  - gym
9
  - lunar-lander
10
+ - arxiv:2604.13517
11
  license: cc-by-4.0
12
  library_name: pytorch
13
  ---
 
16
 
17
  [![arXiv](https://img.shields.io/badge/arXiv-2604.13517-b31b1b.svg)](https://arxiv.org/abs/2604.13517)
18
  [![GitHub](https://img.shields.io/badge/GitHub-Codebase-blue?logo=github)](https://github.com/ben-dlwlrma/Representation-Over-Routing)
19
+ [![Demo](https://img.shields.io/badge/Hugging%20Face-Space-yellow?logo=huggingface)](https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo)
20
 
21
  This repository hosts the **pre-trained PyTorch model weights** for the 4-stage ablation study presented in the paper: *"Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO"*.
22
 
23
  Our work identifies severe optimization pathologies in multi-timescale RL (**Surrogate Objective Hacking** and **the Paradox of Temporal Uncertainty**) and introduces **Target Decoupling** to align agents with true long-term objectives without collapsing into short-term behavioral traps.
24
 
25
+ ## Related Links
26
+
27
+ * **Paper:** https://arxiv.org/abs/2604.13517
28
+ * **Interactive Demo Space:** https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo
29
+ * **Official GitHub Repository:** https://github.com/ben-dlwlrma/Representation-Over-Routing
30
+
31
  ## Model Weights Overview
32
 
33
  We provide four standalone `.pth` weight files, corresponding to the isolated stages of our ablation study on the `LunarLander-v2` environment:
 
89
  done = terminated or truncated
90
  ```
91
 
92
+ The paper experiments were conducted on `LunarLander-v2`. The hosted Space may use `LunarLander-v3` for compatibility with current Gymnasium releases, while keeping the same actor architecture and pretrained weights.
93
+
94
  ## Citation
95
 
96
  If you find this code or our insights useful in your research, please consider citing our work:
 
107
  copyright = {Creative Commons Attribution 4.0 International},
108
  keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
109
  }
110
+ ```