PPO Agent - LunarLander-v2

Trained with a CleanRL-style single-file PPO implementation as part of the Hugging Face Deep RL Course — Unit 8.

Results

Metric Value
Mean reward 133.84 ± 73.67
Eval episodes 10

Hyperparameters

Parameter Value
total_timesteps 500000
learning_rate 0.00025
num_envs 4
num_steps 128
gamma 0.99
gae_lambda 0.95
clip_coef 0.2
ent_coef 0.01
vf_coef 0.5
update_epochs 4
num_minibatches 4
Downloads last month
25
Video Preview
loading

Evaluation results