| ### OBSOLETE |
|
|
| Models trained with Matt's original training code while we were getting to grips with the setup. |
| We have since moved to our own training loop in PyTorch, and these checkpoints are no longer useful. |
|
|
|
|
| # JAXGMG checkpoints |
|
|
| Goal Misgeneralisation Models trained |
| Trained with jaxgmg, see [jaxgmg](https://github.com/timaeus-research/jaxgmg), branch `david` for more details. |
|
|
|
|
| ## matt-ckpt |
|
|
| Path pattern: |
| ``` |
| checkpoints/dr-*/7168 |
| ``` |
|
|
| Trained on 15x15 grid |
| File name include the alpha parameter e,g, dr-3eneg5 means alpha=3e-5 |
| Not much known, but useful for debugging. |
|
|
|
|
| ## alpha-blocks-13x13-sweep |
|
|
| Path pattern: |
| ``` |
| checkpoints/run-alpha_${alpha}-steps_200M/files/checkpoints/${checkpoint_number} |
| ``` |
|
|
| All models trained with `blocks` environment generation, world size 13x13. |
|
|
| ``` |
| #!/bin/bash |
| for alpha in 1e-0 1e-1 1e-2 1e-3 1e-4 3.3e-1 3.3e-2 3.3e-3 3.3e-4; do |
| python -m jaxgmg train corner --num-total-env-steps 200_000_000 --keep-all-checkpoints --num-cycles-per-checkpoint 64 --wandb-project jaxgmg2 --wandb-name alpha:${alpha}-steps:200M-theta:0 --prob-shift ${alpha} --env-size 13 --env-layout blocks |
| done |
| ``` |
|
|
| Theta specifies the reward function: |
| reward = proxy_goal * theta + true_goal * (1 - theta) |
|
|
| All these models were trained with theta=0, i.e. the true goal of getting the cheese. |
|
|
| The alpha parameter `prob-shift` controls the fraction of distinguishing v.s. undistinguishing environments. |
| e.g. alpha=1 means the agetn always sees distinguishing environments (ones were the cheese is not in the corner) |
| and alpha=0 means the agent always sees undistinguishing environments (ones were the cheese is always in the corner). |
|
|
| Checkpoints are taken every 64 cycles (~512 env steps per cycle, ~32k env steps per checkpoint?). |
|
|
| E.g. the path `run-alpha_1e1-steps_200M/files/checkpoints/128` corresponds to training with alpha=1e-1, after 128 cycles (the second checkpoint). |
|
|
| # alpha-tree-13x13-sweep |
|
|
| Ditto, but with `--env-layout tree` |
|
|
|
|
|
|
|
|
|
|
|
|