Project: RL1/RL2 (obsolete)
Collection
Older models that are no longer useful for anything in RL1 or RL2, or are now unused as experimentation discontinued. • 16 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
These models were originally used for RL1, but were trained with previous action, and with the variable learning rate bug. Do not use. Code these were trained with has since been removed.
Result of wandb sweep with sweeps/alpha_1_seed_sweep.py.
program: /root/timaeus/projects/rl/main_train.py
method: grid
project: jaxgmg_al_1e0
entity: devinterp
command:
- /root/timaeus/.venv/bin/python
- ${program}
- ${args}
- --use-wandb
- --use-hf
parameters:
mask-type:
value: first_episode
eval-schedule:
value: "0:1,250:2,500:5,1000:10,2000:20"
num-total-env-steps:
value: 5000000000
discount-rate:
value: 0.99
alpha:
value: 1.0
seed:
values: [200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220]
cheese-loc:
value: "any"
num-levels:
value: 9600
grad-acc-per-chunk:
value: 5
wandb-project:
value: jaxgmg_al_1e0
ckpt-dir:
value: jaxgmg_al_1e0
f-str-ckpt:
value: "al_{alpha}_g_{discount_rate}_seed_{seed}"