PPO Agent Playing LunarLander-v2
Trained with a PPO implementation from scratch (CleanRL-style) in PyTorch.
Results
- Mean reward: 8.59
- Std reward: 73.34
Hyperparameters
- exp_name: ppo_from_scratch
- seed: 1
- cuda: 1
- env_id: LunarLander-v2
- total_timesteps: 400000
- learning_rate: 0.00025
- num_envs: 8
- num_steps: 128
- anneal_lr: 1
- gamma: 0.99
- gae_lambda: 0.95
- num_minibatches: 4
- update_epochs: 4
- clip_coef: 0.2
- ent_coef: 0.01
- vf_coef: 0.5
- max_grad_norm: 0.5
- repo_id: CharithAnupama/ppo-LunarLander-v2
- batch_size: 1024
- minibatch_size: 256
- Downloads last month
- -
Evaluation results
- mean_reward on LunarLander-v2self-reported8.59 +/- 73.34