### UNUSED, BUT NOT USEFUL A rerun of the models ``` al_0.75_g_0.97_seed_122_pa_1 al_0.75_g_0.97_seed_131_pa_1 al_0.75_g_0.97_seed_200_pa_1 ``` as previous runs with the same seed had chaotic loss/regret curves. Didn't replicate with using the same seed, attribute bad runs to faulty hardware/something I can't control. Wandb: https://wandb.ai/devinterp/jaxgmg2_cursed Hyperparams: ``` rl_action=train model_type=impala lr=5e-05 discount_rate=0.97 num_rollout_steps=64 grad_acc_per_chunk=4 num_rollout_chunks=1 cheese_loc=any env_layout=open alpha=0.75 env_size=13 num_levels=9600 compile=True use_prev_action=False weight_restrictions=None weight_restrictions_invert=False use_bf16=False use_wandb=True seed=122 mask_type=first_episode ckpt_dir=jaxgmg2_cursed vis_average_state=False trim_episodes=False num_total_env_steps=9999974400 eval_every=1 eff_horizon=None optim=adam env_rule=None env_rule_mixture=None hf_user=davidquarel hf_collection=davidquarel/jaxgmg use_hf=True num_hf_uploads=1 use_log=True log_optimizer_state=False resume=None resume_id=None resume_optim=False checkpoint=al_0.75_g_0.97_seed_122_pa_1 wandb_project=jaxgmg2_cursed eval_schedule=0:1,250:2,500:5,2000:10 render_sixel=False sixel_idx=60 live_monitor=False run_id=0 seed_formula=None deterministic=True penalize_time=False f_str_ckpt=al_0.75_g_0.97_seed_122_pa_1 duplication_factor=-1 smoke=False ntfy=david_jaxgmg num_chains=6 num_draws=3000 num_steps_bw_draws=1 on_policy=True llc_nbeta=3000 localization=10 exact_solver_each_draw=False llc_optimizer=sgld iw_clip_eps=None rmsprop_burnin_steps=20 llc_data_file=llc_scan_open_reinforce.pkl llc_checkpoint_index=None llc_checkpoint_number=None sink=None repo_id=davidquarel/jaxgmg_ckpt_zip use_shuffled_checkpoints=False force_re_download=False off_distribution_data=False evaluate_every_position=False num_prev_actions=1 eff_acc_steps=4 chunk_size=9600 env_steps_per_microbatch=153600 ckpt_path=jaxgmg2_cursed/al_0.75_g_0.97_seed_122_pa_1 env_steps_per_loop=614400 total_loops=16276 ```