Final PPO + Curiosity model trained for 10M steps using Optuna best parameters b0313cf verified Adi070204 commited on Jan 8