Further 3 million steps. PPO LunarLander-v2 trained agent 56fe05b verified sighmon commited on Jan 20, 2025
Reverting to v2, but trained on 2 million steps. PPO LunarLander-v2 trained agent 7867e54 verified sighmon commited on Jan 19, 2025
Higher rollout length, more epochs, lower gamma. PPO LunarLander-v2 trained agent 9526858 verified sighmon commited on Jan 19, 2025