YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

JAXGMG checkpoints

Goal Misgeneralisation Models trained Trained with jaxgmg, see jaxgmg, branch david for more details.

matt-ckpt

Path pattern:

checkpoints/dr-*/7168

Trained on 15x15 grid File name include the alpha parameter e,g, dr-3eneg5 means alpha=3e-5 Not much known, but useful for debugging.

alpha-blocks-13x13-sweep

Path pattern:

checkpoints/run-alpha_${alpha}-steps_200M/files/checkpoints/${checkpoint_number}

All models trained with blocks environment generation, world size 13x13.

#!/bin/bash
for alpha in 1e-0 1e-1 1e-2 1e-3 1e-4 3.3e-1 3.3e-2 3.3e-3 3.3e-4; do
    python -m jaxgmg train corner --num-total-env-steps 200_000_000 --keep-all-checkpoints --num-cycles-per-checkpoint 64 --wandb-project jaxgmg2 --wandb-name alpha:${alpha}-steps:200M-theta:0 --prob-shift ${alpha} --env-size 13 --env-layout blocks
done

Theta specifies the reward function: reward = proxy_goal * theta + true_goal * (1 - theta)

All these models were trained with theta=0, i.e. the true goal of getting the cheese.

The alpha parameter prob-shift controls the fraction of distinguishing v.s. undistinguishing environments. e.g. alpha=1 means the agetn always sees distinguishing environments (ones were the cheese is not in the corner) and alpha=0 means the agent always sees undistinguishing environments (ones were the cheese is always in the corner).

Checkpoints are taken every 64 cycles (~512 env steps per cycle, ~32k env steps per checkpoint?).

E.g. the path run-alpha_1e1-steps_200M/files/checkpoints/128 corresponds to training with alpha=1e-1, after 128 cycles (the second checkpoint).

alpha-tree-13x13-sweep

Ditto, but with --env-layout tree

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including davidquarel/jaxgmg_checkpoints