PPO Agent - LunarLander-v2
Trained with a CleanRL-style single-file PPO implementation as part of the Hugging Face Deep RL Course — Unit 8.
Results
| Metric | Value |
|---|---|
| Mean reward | 133.84 ± 73.67 |
| Eval episodes | 10 |
Hyperparameters
| Parameter | Value |
|---|---|
| total_timesteps | 500000 |
| learning_rate | 0.00025 |
| num_envs | 4 |
| num_steps | 128 |
| gamma | 0.99 |
| gae_lambda | 0.95 |
| clip_coef | 0.2 |
| ent_coef | 0.01 |
| vf_coef | 0.5 |
| update_epochs | 4 |
| num_minibatches | 4 |
- Downloads last month
- 25
Evaluation results
- mean_reward on LunarLander-v2self-reported133.84 +/- 73.67