Asystemoffields
/

disco-103

Model card Files Files and versions

Asystemoffields commited on Mar 8

Commit

ca7c6c8

·

verified ·

1 Parent(s): 1979176

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Pretrained weights for the **Disco103** meta-network from [*Discovering State-of
 A small LSTM neural network (754,778 parameters) that **generates loss targets** for RL agents. Instead of hand-crafted loss functions like PPO or GRPO, Disco103 observes an agent's rollout — policy logits, rewards, advantages, auxiliary predictions — and outputs target distributions the agent should match.
-Meta-trained by DeepMind across 103 complex environments (Atari, ProcGen, DMLab-30).
 ## Usage

 A small LSTM neural network (754,778 parameters) that **generates loss targets** for RL agents. Instead of hand-crafted loss functions like PPO or GRPO, Disco103 observes an agent's rollout — policy logits, rewards, advantages, auxiliary predictions — and outputs target distributions the agent should match.
+Meta-trained by DeepMind across 103 complex environments (Atari, ProcGen, DMLab-30).  Originally in JAX, this is a PyTorch port.
 ## Usage