David Quarel commited on
Commit
11d7617
·
1 Parent(s): c5fd14a

README: remove unicode, add WandB URL, use config.cfg style hyperparams

Browse files
Files changed (1) hide show
  1. README.md +38 -19
README.md CHANGED
@@ -1,20 +1,43 @@
1
  # jaxgmg2_shared_init
2
 
3
- A collection of RL agent checkpoints studying the effect of shared initialization. Two base models (run IDs 19 and 27 from `jaxgmg2_3phase_optim_state`) are each used as a shared starting point, then independently continued from checkpoint 0 (fresh optimizer state) with α=1.0 across 10 different random seeds each.
4
-
5
- ## Training Configuration
6
-
7
- - **Environment**: JaxGMG open maze, cheese at any location, 9600 levels
8
- - **Algorithm**: REINFORCE with value function baseline
9
- - **Alpha (α)**: 1.0
10
- - **Discount rate (γ)**: 0.98
11
- - **Learning rate**: 5e-5
12
- - **Total env steps**: 1,351,680,000 (~1.35B, 21k gradient steps)
13
- - **Rollout steps**: 64
14
- - **Base models**: `jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_19_seed_981019` and `...id_27_seed_981027`
15
- - **Resume optimizer**: No (fresh optimizer at checkpoint 0)
16
- - **Seeds per base model**: 30–39
17
- - **Optimizer state saved**: Yes
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ## Naming Schema
20
 
@@ -29,7 +52,3 @@ make run projects/rl/experiments/shared_init/jobs/train.yaml
29
  ```
30
 
31
  from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).
32
-
33
- ## WandB
34
-
35
- Project: `jaxgmg2_shared_init`
 
1
  # jaxgmg2_shared_init
2
 
3
+ 20 RL agent checkpoints studying the effect of shared initialization. Two base models (run_ids 19 and 27
4
+ from jaxgmg2_3phase_optim_state) are each used as a shared starting point, then independently continued
5
+ from checkpoint 0 (fresh optimizer state) with alpha=1.0 across 10 different random seeds each.
6
+
7
+ **WandB:** https://wandb.ai/devinterp/jaxgmg2_shared_init
8
+
9
+ ## Sweep
10
+
11
+ 2 base models x 10 seeds (30-39) = 20 total runs.
12
+
13
+ Base models resumed:
14
+ - `jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_19_seed_981019`
15
+ - `jaxgmg2_3phase_optim_state/al_1.0_g_0.98_id_27_seed_981027`
16
+
17
+ ## Shared Hyperparams
18
+
19
+ ```
20
+ rl_action=train
21
+ alpha=1.0
22
+ discount_rate=0.98
23
+ lr=5e-05
24
+ num_total_env_steps=1351680000
25
+ num_rollout_steps=64
26
+ num_levels=9600
27
+ cheese_loc=any
28
+ env_layout=open
29
+ env_size=13
30
+ resume_id=0
31
+ resume_optim=False
32
+ grad_acc_per_chunk=4
33
+ log_optimizer_state=True
34
+ eval_schedule=0:1,250:2,500:5,2000:10
35
+ f_str_ckpt=al_1.0_g_0.98_id_{run_id}_shared_init_seed_{seed}
36
+ ckpt_dir=jaxgmg2_shared_init
37
+ wandb_project=jaxgmg2_shared_init
38
+ use_wandb=True
39
+ use_hf=True
40
+ ```
41
 
42
  ## Naming Schema
43
 
 
52
  ```
53
 
54
  from the [timaeus monorepo](https://github.com/timaeus-research/timaeus).