Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ Pretrained weights for the **Disco103** meta-network from [*Discovering State-of
|
|
| 19 |
|
| 20 |
A small LSTM neural network (754,778 parameters) that **generates loss targets** for RL agents. Instead of hand-crafted loss functions like PPO or GRPO, Disco103 observes an agent's rollout — policy logits, rewards, advantages, auxiliary predictions — and outputs target distributions the agent should match.
|
| 21 |
|
| 22 |
-
Meta-trained by DeepMind across 103 complex environments (Atari, ProcGen, DMLab-30).
|
| 23 |
|
| 24 |
## Usage
|
| 25 |
|
|
|
|
| 19 |
|
| 20 |
A small LSTM neural network (754,778 parameters) that **generates loss targets** for RL agents. Instead of hand-crafted loss functions like PPO or GRPO, Disco103 observes an agent's rollout — policy logits, rewards, advantages, auxiliary predictions — and outputs target distributions the agent should match.
|
| 21 |
|
| 22 |
+
Meta-trained by DeepMind across 103 complex environments (Atari, ProcGen, DMLab-30). Originally in JAX, this is a PyTorch port.
|
| 23 |
|
| 24 |
## Usage
|
| 25 |
|