| # jaxgmg2_3phase_optim_state_alpha1 | |
| > **Note:** Einar trained these models and the description below is uncertain. | |
| Single RL agent checkpoint from jaxgmg2_3phase_optim_state resumed with alpha changed from 0.6 to 1.0, | |
| exploring how changing the value function loss weight mid-training affects learning dynamics. | |
| **WandB:** https://wandb.ai/devinterp/jaxgmg2_patt | |
| ## Hyperparams | |
| ``` | |
| rl_action=train | |
| alpha=1.0 | |
| discount_rate=0.98 | |
| lr=5e-05 | |
| num_total_env_steps=7372800000 | |
| num_rollout_steps=64 | |
| num_levels=9600 | |
| cheese_loc=any | |
| env_layout=open | |
| env_size=13 | |
| mask_type=first_episode | |
| use_prev_action=False | |
| grad_acc_per_chunk=4 | |
| log_optimizer_state=True | |
| resume=jaxgmg2_3phase_optim_state/al_0.6_g_0.98_id_17_seed_980617 | |
| resume_id=3810 | |
| resume_optim=True | |
| deterministic=True | |
| seed=42 | |
| checkpoint=al_0.6_g_0.98_id_17_seed_980617_resume_alpha1 | |
| ckpt_dir=jaxgmg2_3phase_optim_state_alpha1 | |
| wandb_project=jaxgmg2_patt | |
| use_wandb=True | |
| use_hf=True | |
| ``` | |
| ## Naming Schema | |
| Checkpoint: `al_0.6_g_0.98_id_17_seed_980617_resume_alpha1` | |
| ## Reproduced with | |
| See [`train.yaml`](./train.yaml) in this repository. Run from the | |
| [timaeus monorepo](https://github.com/timaeus-research/timaeus). | |