timaeus
/

jaxgmg2_3phase_optim_state_alpha1

Model card Files Files and versions

jaxgmg2_3phase_optim_state_alpha1 / README.md

dquarel's picture

Add Einar disclaimer to README

16811a1 about 2 months ago

|

history blame contribute delete

1.16 kB

	# jaxgmg2_3phase_optim_state_alpha1

	> Note: Einar trained these models and the description below is uncertain.

	Single RL agent checkpoint from jaxgmg2_3phase_optim_state resumed with alpha changed from 0.6 to 1.0,
	exploring how changing the value function loss weight mid-training affects learning dynamics.

	WandB: https://wandb.ai/devinterp/jaxgmg2_patt

	## Hyperparams

	```
	rl_action=train
	alpha=1.0
	discount_rate=0.98
	lr=5e-05
	num_total_env_steps=7372800000
	num_rollout_steps=64
	num_levels=9600
	cheese_loc=any
	env_layout=open
	env_size=13
	mask_type=first_episode
	use_prev_action=False
	grad_acc_per_chunk=4
	log_optimizer_state=True
	resume=jaxgmg2_3phase_optim_state/al_0.6_g_0.98_id_17_seed_980617
	resume_id=3810
	resume_optim=True
	deterministic=True
	seed=42
	checkpoint=al_0.6_g_0.98_id_17_seed_980617_resume_alpha1
	ckpt_dir=jaxgmg2_3phase_optim_state_alpha1
	wandb_project=jaxgmg2_patt
	use_wandb=True
	use_hf=True
	```

	## Naming Schema

	Checkpoint: `al_0.6_g_0.98_id_17_seed_980617_resume_alpha1`

	## Reproduced with

	See [`train.yaml`](./train.yaml) in this repository. Run from the
	[timaeus monorepo](https://github.com/timaeus-research/timaeus).