Update best_model.pt to v18b (obs_dim=96, 512-512-256, reward=0.584, loss penalty) 2e8eda4 verified JoshuaFreeman commited on Apr 5
Update best_model.pt to v17 (obs_dim=80, best_reward=0.535) 9d3b760 verified JoshuaFreeman commited on Apr 5
v13b (update 1550): normalized elim + winner bonus, vf=0.5, best generalization 2296e2d verified JoshuaFreeman commited on Apr 3
v12a: 100% win rate on Easy/2, normalized elimination reward 2d620cc verified JoshuaFreeman commited on Apr 3