PPO Agent Playing LunarLandar-v3

This is a trained model of a PPO agent playing LunarLandar-v3.

Hyperparameters

timesteps=2e6,
steps_before_update=1000,
mini_batch_size=64, epochs=3,
lr=3e-4,
gamma=0.99,
gae_lambda=0.95,
clip_coef=0.2,
norm_adv=True,
vf_coef=0.5,
ent_coef=0.05,
max_grad_norm=1.0,
target_kl=0.015

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v2
self-reported

142.41 +/- 54.45