Jaxgmg
Collection
A collection of all the models trained for the Jaxgmg project. Architecture is a standard IMPALA conv. net
•
14 items
•
Updated
Shared training config
num_rollout_steps=64
lr=5e-05
discount_rate=0.99
eff_horizon=None
eval_every=1
use_wandb=True
num_total_env_steps=5000000000
render_sixel=False
seed=42
mask_type=first_episode
penalize_time=False
optim=adam
live_monitor=False
checkpoint_schedule=0:1,250:2,500:5,1000:10,2000:20
grad_acc_per_chunk=4
num_rollout_chunks=1
env_layout=open
env_size=13
num_levels=9600
env_steps_per_loop=None
total_loops=None
wandb_project=jaxgmg_al_sweep
ckpt_dir=jaxgmg_al_sweep
duplication_factor=1
Training config that differs between runs
cheese_loc in [row, any] sampled uniform
alpha in [1e-3, 1] sampled log_uniform
Models saved as al_{alpha:.1e}_{cheese_loc}
Levels are sampled from distribution level ~ (1-alpha) cheese_in_corner + (alpha) cheese_elsewhere
All levels are 13x13 grids, with a 1x1 wide wall bordering all edges (leaving 11x11 navigatible space). Evaluations performed on all valid environmental configurations 11^2 * (11^2 - 1) = 14520.
We log the average regret over
regret/any all cells,regret/corner the top-left-corner,regret/row the top row,regret/bot the bottom row,regret/dist expected on-distribution regret = (1-alpha) * corner_regret + alpha * (cheese_loc_regret)cheese_in_corner : Cheese always spawns in the top left corner.
alpha: The mixing parameter for the environmental distribution.alpha mean more levels where the cheese is in the corner -> model has more pressure to goal. misgen. and learn to desire top-left-cornercheese_loc: Controls the behaviour of the cheese_elsewhere environmentrow : The cheese is placed uniformly at random somewhere along the top rowany : The cheese is placed uniformly at random anywhere in the grid