Model trained for 10 million timesteps with mean_reward=286.17 f923604 sam133 commited on Dec 14, 2022