File size: 4,943 Bytes
23f4823
3ae4f81
 
 
 
 
 
 
 
967bf28
23f4823
3ae4f81
23f4823
3ae4f81
 
 
 
 
967bf28
3ae4f81
 
 
 
 
967bf28
 
 
 
 
 
3ae4f81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
967bf28
 
3ae4f81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
967bf28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
tags:
- reinforcement-learning
- pytorch
- custom-implementation
- ppo
- deep-reinforcement-learning
- gym
- lunar-lander
- arxiv:2604.13517
license: cc-by-4.0
library_name: pytorch
---

# Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

[![arXiv](https://img.shields.io/badge/arXiv-2604.13517-b31b1b.svg)](https://arxiv.org/abs/2604.13517)
[![GitHub](https://img.shields.io/badge/GitHub-Codebase-blue?logo=github)](https://github.com/ben-dlwlrma/Representation-Over-Routing)
[![Demo](https://img.shields.io/badge/Hugging%20Face-Space-yellow?logo=huggingface)](https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo)

This repository hosts the **pre-trained PyTorch model weights** for the 4-stage ablation study presented in the paper: *"Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO"*.

Our work identifies severe optimization pathologies in multi-timescale RL (**Surrogate Objective Hacking** and **the Paradox of Temporal Uncertainty**) and introduces **Target Decoupling** to align agents with true long-term objectives without collapsing into short-term behavioral traps.

## Related Links

* **Paper:** https://arxiv.org/abs/2604.13517
* **Interactive Demo Space:** https://huggingface.co/spaces/ben-dlwlrma/Representation-Over-Routing-Demo
* **Official GitHub Repository:** https://github.com/ben-dlwlrma/Representation-Over-Routing

## Model Weights Overview

We provide four standalone `.pth` weight files, corresponding to the isolated stages of our ablation study on the `LunarLander-v2` environment:

* **`1_baseline.pth` (Baseline)**: Suffers from hovering local optima, wasting fuel to hoard small centering rewards due to a fear of crashing.
* **`2_surrogate_hacking_attention.pth` (Surrogate Hacking)**: Demonstrates multi-timescale collapse. The policy artificially minimizes the surrogate loss by manipulating attention weights instead of improving physical control.
* **`3_temporal_paradox_variance.pth` (Temporal Paradox)**: Exhibits aimless wandering caused by the inability to confidently attribute credit over long horizons.
* **`4_target_decoupling_final.pth` (Target Decoupling)**: **Our proposed solution.** The agent uncovers true intelligence, executing a highly fuel-efficient and safe landing by understanding the ultimate long-term goal ($\gamma = 0.999$).

## Usage & Inference

To fully reproduce the training process or run the visual evaluations (GIFs), please refer to the [official GitHub repository](https://github.com/ben-dlwlrma/Representation-Over-Routing).

Because the published weights only contain the parameters for the Actor networks, inference is exceptionally lightweight. You do not need to import the full training architecture. You can directly load the weights into a standard PyTorch `nn.Sequential` module using the following minimal snippet:

```python
import torch
import torch.nn as nn
import numpy as np
import gymnasium as gym
from huggingface_hub import hf_hub_download

# 1. Download a specific stage's weight from Hugging Face
weight_path = hf_hub_download(
    repo_id="ben-dlwlrma/Representation-Over-Routing", 
    filename="4_target_decoupling_final.pth"
)

# 2. Define the exact Actor network architecture
def layer_init(layer, std=np.sqrt(2), bias_const=0.0):
    nn.init.orthogonal_(layer.weight, std)
    nn.init.constant_(layer.bias, bias_const)
    return layer

actor = nn.Sequential(
    layer_init(nn.Linear(8, 64)),
    nn.Tanh(),
    layer_init(nn.Linear(64, 64)),
    nn.Tanh(),
    layer_init(nn.Linear(64, 4), std=0.01),
)

# 3. Load weights
actor.load_state_dict(torch.load(weight_path, weights_only=True))
actor.eval()

# 4. Run Inference in environment
env = gym.make("LunarLander-v2")
state, _ = env.reset()
done = False

while not done:
    state_tensor = torch.FloatTensor(state).unsqueeze(0)
    with torch.no_grad():
        logits = actor(state_tensor)
        action = torch.argmax(logits, dim=1).item()
    
    state, reward, terminated, truncated, _ = env.step(action)
    done = terminated or truncated
```

The paper experiments were conducted on `LunarLander-v2`. The hosted Space may use `LunarLander-v3` for compatibility with current Gymnasium releases, while keeping the same actor architecture and pretrained weights.

## Citation

If you find this code or our insights useful in your research, please consider citing our work:

```bibtex
@misc{sunRepresentationRoutingOvercoming2026b,
  title = {Representation over {{Routing}}: {{Overcoming Surrogate Hacking}} in {{Multi-Timescale PPO}}},
  shorttitle = {Representation over {{Routing}}},
  author = {Sun, Jing},
  year = 2026,
  publisher = {arXiv},
  doi = {10.48550/ARXIV.2604.13517},
  urldate = {2026-04-16},
  copyright = {Creative Commons Attribution 4.0 International},
  keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
}
```