# jaxgmg2_3phase_optim_state_alpha1 > **Note:** Einar trained these models and the description below is uncertain. Single RL agent checkpoint from jaxgmg2_3phase_optim_state resumed with alpha changed from 0.6 to 1.0, exploring how changing the value function loss weight mid-training affects learning dynamics. **WandB:** https://wandb.ai/devinterp/jaxgmg2_patt ## Hyperparams ``` rl_action=train alpha=1.0 discount_rate=0.98 lr=5e-05 num_total_env_steps=7372800000 num_rollout_steps=64 num_levels=9600 cheese_loc=any env_layout=open env_size=13 mask_type=first_episode use_prev_action=False grad_acc_per_chunk=4 log_optimizer_state=True resume=jaxgmg2_3phase_optim_state/al_0.6_g_0.98_id_17_seed_980617 resume_id=3810 resume_optim=True deterministic=True seed=42 checkpoint=al_0.6_g_0.98_id_17_seed_980617_resume_alpha1 ckpt_dir=jaxgmg2_3phase_optim_state_alpha1 wandb_project=jaxgmg2_patt use_wandb=True use_hf=True ``` ## Naming Schema Checkpoint: `al_0.6_g_0.98_id_17_seed_980617_resume_alpha1` ## Reproduced with See [`train.yaml`](./train.yaml) in this repository. Run from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).