Asystemoffields commited on
Commit
ca7c6c8
·
verified ·
1 Parent(s): 1979176

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -19,7 +19,7 @@ Pretrained weights for the **Disco103** meta-network from [*Discovering State-of
19
 
20
  A small LSTM neural network (754,778 parameters) that **generates loss targets** for RL agents. Instead of hand-crafted loss functions like PPO or GRPO, Disco103 observes an agent's rollout — policy logits, rewards, advantages, auxiliary predictions — and outputs target distributions the agent should match.
21
 
22
- Meta-trained by DeepMind across 103 complex environments (Atari, ProcGen, DMLab-30).
23
 
24
  ## Usage
25
 
 
19
 
20
  A small LSTM neural network (754,778 parameters) that **generates loss targets** for RL agents. Instead of hand-crafted loss functions like PPO or GRPO, Disco103 observes an agent's rollout — policy logits, rewards, advantages, auxiliary predictions — and outputs target distributions the agent should match.
21
 
22
+ Meta-trained by DeepMind across 103 complex environments (Atari, ProcGen, DMLab-30). Originally in JAX, this is a PyTorch port.
23
 
24
  ## Usage
25