PPO Agent Playing LunarLandar-v3
This is a trained model of a PPO agent playing LunarLandar-v3.
Hyperparameters
timesteps=2e6,
steps_before_update=1000,
mini_batch_size=64, epochs=3,
lr=3e-4,
gamma=0.99,
gae_lambda=0.95,
clip_coef=0.2,
norm_adv=True,
vf_coef=0.5,
ent_coef=0.05,
max_grad_norm=1.0,
target_kl=0.015
Evaluation results
- mean_reward on LunarLander-v2self-reported142.41 +/- 54.45