YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
JAXGMG checkpoints
Goal Misgeneralisation Models trained
Trained with jaxgmg, see jaxgmg, branch david for more details.
matt-ckpt
Path pattern:
checkpoints/dr-*/7168
Trained on 15x15 grid File name include the alpha parameter e,g, dr-3eneg5 means alpha=3e-5 Not much known, but useful for debugging.
alpha-blocks-13x13-sweep
Path pattern:
checkpoints/run-alpha_${alpha}-steps_200M/files/checkpoints/${checkpoint_number}
All models trained with blocks environment generation, world size 13x13.
#!/bin/bash
for alpha in 1e-0 1e-1 1e-2 1e-3 1e-4 3.3e-1 3.3e-2 3.3e-3 3.3e-4; do
python -m jaxgmg train corner --num-total-env-steps 200_000_000 --keep-all-checkpoints --num-cycles-per-checkpoint 64 --wandb-project jaxgmg2 --wandb-name alpha:${alpha}-steps:200M-theta:0 --prob-shift ${alpha} --env-size 13 --env-layout blocks
done
Theta specifies the reward function: reward = proxy_goal * theta + true_goal * (1 - theta)
All these models were trained with theta=0, i.e. the true goal of getting the cheese.
The alpha parameter prob-shift controls the fraction of distinguishing v.s. undistinguishing environments.
e.g. alpha=1 means the agetn always sees distinguishing environments (ones were the cheese is not in the corner)
and alpha=0 means the agent always sees undistinguishing environments (ones were the cheese is always in the corner).
Checkpoints are taken every 64 cycles (~512 env steps per cycle, ~32k env steps per checkpoint?).
E.g. the path run-alpha_1e1-steps_200M/files/checkpoints/128 corresponds to training with alpha=1e-1, after 128 cycles (the second checkpoint).
alpha-tree-13x13-sweep
Ditto, but with --env-layout tree