jaxgmg_ckpt_df / README.md
davidquarel's picture
Create README.md
6b81eb7 verified

OBSOLETE

A series of models trained while varying over duplication_factor when the cheese was always in the corner, meaning there were only 120 possible states for the environment to be in. No longer relevant as we train with alpha, given the enviromental distribution $\Lambda_{alpha} = \alpha \Lambda_{1} + (1- \alpha) \Lambda_{0}$ where

  • \Lambda_1 : uniform distribution over all cheese/mouse positions
  • \Lambda_0 : cheese always in corner, mouse uniform over all positions

No longer relevant as duplication_factor has since been removed as there are now ~14k many states instead of 120.