YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Wandb Logs https://wandb.ai/devinterp/jaxgmg_al_sweep

Shared training config

num_rollout_steps=64
lr=5e-05
discount_rate=0.99
eff_horizon=None
eval_every=1
use_wandb=True
num_total_env_steps=5000000000
render_sixel=False
seed=42
mask_type=first_episode
penalize_time=False
optim=adam
live_monitor=False
checkpoint_schedule=0:1,250:2,500:5,1000:10,2000:20
grad_acc_per_chunk=4
num_rollout_chunks=1
env_layout=open
env_size=13
num_levels=9600
env_steps_per_loop=None
total_loops=None
wandb_project=jaxgmg_al_sweep
ckpt_dir=jaxgmg_al_sweep
duplication_factor=1

Training config that differs between runs

cheese_loc in [row, any] sampled uniform
alpha in [1e-3, 1] sampled log_uniform

Models saved as al_{alpha:.1e}_{cheese_loc}

Levels are sampled from distribution level ~ (1-alpha) cheese_in_corner + (alpha) cheese_elsewhere

All levels are 13x13 grids, with a 1x1 wide wall bordering all edges (leaving 11x11 navigatible space). Evaluations performed on all valid environmental configurations 11^2 * (11^2 - 1) = 14520.

We log the average regret over

  • regret/any all cells,
  • regret/corner the top-left-corner,
  • regret/row the top row,
  • regret/bot the bottom row,
  • regret/dist expected on-distribution regret = (1-alpha) * corner_regret + alpha * (cheese_loc_regret)

cheese_in_corner : Cheese always spawns in the top left corner.

  • alpha: The mixing parameter for the environmental distribution.
  • Lower values of alpha mean more levels where the cheese is in the corner -> model has more pressure to goal. misgen. and learn to desire top-left-corner
  • cheese_loc: Controls the behaviour of the cheese_elsewhere environment
    • row : The cheese is placed uniformly at random somewhere along the top row
    • any : The cheese is placed uniformly at random anywhere in the grid
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including davidquarel/jaxgmg_al_sweep