| ### Not strictly obsolete, but reduced precision found to add noise to training dynamics and was discontinued. |
|
|
| A series of training runs with alpha=0.47, gamma (discount_rate) = 0.99 |
| with bfloat16 reduced precision. We found that the training |
| runs were not nearly as stable [(wandb here)](https://wandb.ai/devinterp/jaxgmg_3phase_bf16) and so |
| this path was abandonded. Models kept for posterity. |
| |
| Hyperparams: |
| ``` |
| rl_action=train |
| num_rollout_steps=64 |
| lr=5e-05 |
| discount_rate=0.99 |
| eff_horizon=None |
| eval_every=1 |
| use_wandb=True |
| use_hf=True |
| use_log=True |
| num_total_env_steps=5000000000 |
| checkpoint=al_0.47_g_0.99_100_bf16 |
| render_sixel=True |
| sixel_loc=(7, 7) |
| seed=100 |
| mask_type=first_episode |
| penalize_time=False |
| optim=adam |
| live_monitor=False |
| use_bf16=True |
| checkpoint_schedule=0:8 |
| grad_acc_per_chunk=16 |
| num_rollout_chunks=1 |
| cheese_loc=any |
| env_layout=open |
| alpha=0.47 |
| env_size=13 |
| num_levels=9600 |
| f_str_ckpt=al_{alpha}_g_{discount_rate}_{seed}_bf16 |
| wandb_project=jaxgmg_3phase_bf16 |
| ckpt_dir=jaxgmg_3phase_bf16 |
| duplication_factor=-1 |
| smoke=False |
| num_chains=6 |
| num_draws=3000 |
| on_policy=True |
| nbeta=3000 |
| localization=10 |
| exact_solver_each_draw=False |
| llc_optimizer=sgld |
| iw_clip_eps=None |
| rmsprop_burnin=20 |
| llc_data_file=llc_scan_open_reinforce.pkl |
| llc_checkpoint_index=0 |
| repo_id=davidquarel/jaxgmg_ckpt_zip |
| use_shuffled_checkpoints=0 |
| force_re_download=False |
| ``` |