PPO Agent Playing LunarLander-v2

Trained with a PPO implementation from scratch (CleanRL-style) in PyTorch.

Results

  • Mean reward: 8.59
  • Std reward: 73.34

Hyperparameters

  • exp_name: ppo_from_scratch
  • seed: 1
  • cuda: 1
  • env_id: LunarLander-v2
  • total_timesteps: 400000
  • learning_rate: 0.00025
  • num_envs: 8
  • num_steps: 128
  • anneal_lr: 1
  • gamma: 0.99
  • gae_lambda: 0.95
  • num_minibatches: 4
  • update_epochs: 4
  • clip_coef: 0.2
  • ent_coef: 0.01
  • vf_coef: 0.5
  • max_grad_norm: 0.5
  • repo_id: CharithAnupama/ppo-LunarLander-v2
  • batch_size: 1024
  • minibatch_size: 256
Downloads last month
-
Video Preview
loading

Evaluation results