timaeus
/

jaxgmg2_shared_init

David Quarel commited on Apr 8

Commit

11d7617

1 Parent(s): c5fd14a

README: remove unicode, add WandB URL, use config.cfg style hyperparams

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,20 +1,43 @@
 # jaxgmg2_shared_init
-A collection of RL agent checkpoints studying the effect of shared initialization. Two base models (run IDs 19 and 27 from `jaxgmg2_3phase_optim_state`) are each used as a shared starting point, then independently continued from checkpoint 0 (fresh optimizer state) with α=1.0 across 10 different random seeds each.
-## Training Configuration
-- **Environment**: JaxGMG open maze, cheese at any location, 9600 levels
-- **Algorithm**: REINFORCE with value function baseline
-- **Alpha (α)**: 1.0
-- **Discount rate (γ)**: 0.98
-- **Learning rate**: 5e-5
-- **Total env steps**: 1,351,680,000 (~1.35B, 21k gradient steps)
-- **Rollout steps**: 64
-- **Base models**: `jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_19_seed_981019` and `...id_27_seed_981027`
-- **Resume optimizer**: No (fresh optimizer at checkpoint 0)
-- **Seeds per base model**: 30–39
-- **Optimizer state saved**: Yes
 ## Naming Schema
@@ -29,7 +52,3 @@ make run projects/rl/experiments/shared_init/jobs/train.yaml
 ```
 from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).
-## WandB
-Project: `jaxgmg2_shared_init`

 # jaxgmg2_shared_init
+20 RL agent checkpoints studying the effect of shared initialization. Two base models (run_ids 19 and 27
+from jaxgmg2_3phase_optim_state) are each used as a shared starting point, then independently continued
+from checkpoint 0 (fresh optimizer state) with alpha=1.0 across 10 different random seeds each.
+**WandB:** https://wandb.ai/devinterp/jaxgmg2_shared_init
+## Sweep
+2 base models x 10 seeds (30-39) = 20 total runs.
+Base models resumed:
+- `jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_19_seed_981019`
+- `jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_27_seed_981027`
+## Shared Hyperparams
+```
+rl_action=train
+alpha=1.0
+discount_rate=0.98
+lr=5e-05
+num_total_env_steps=1351680000
+num_rollout_steps=64
+num_levels=9600
+cheese_loc=any
+env_layout=open
+env_size=13
+resume_id=0
+resume_optim=False
+grad_acc_per_chunk=4
+log_optimizer_state=True
+eval_schedule=0:1,250:2,500:5,2000:10
+f_str_ckpt=al_1.0_g_0.98_id_{run_id}_shared_init_seed_{seed}
+ckpt_dir=jaxgmg2_shared_init
+wandb_project=jaxgmg2_shared_init
+use_wandb=True
+use_hf=True
+```
 ## Naming Schema
 ```
 from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).