CharithAnupama
/

ppo-LunarLander-v2

Reinforcement Learning

custom-implementation

Eval Results (legacy)

Model card Files Files and versions

Metrics Training metrics Community

PPO Agent Playing LunarLander-v2

Trained with a PPO implementation from scratch (CleanRL-style) in PyTorch.

Results

Mean reward: 8.59
Std reward: 73.34

Hyperparameters

exp_name: ppo_from_scratch
seed: 1
cuda: 1
env_id: LunarLander-v2
total_timesteps: 400000
learning_rate: 0.00025
num_envs: 8
num_steps: 128
anneal_lr: 1
gamma: 0.99
gae_lambda: 0.95
num_minibatches: 4
update_epochs: 4
clip_coef: 0.2
ent_coef: 0.01
vf_coef: 0.5
max_grad_norm: 0.5
repo_id: CharithAnupama/ppo-LunarLander-v2
batch_size: 1024
minibatch_size: 256

Downloads last month: 5

Video Preview

Reinforcement Learning

loading

Evaluation results

mean_reward on LunarLander-v2
self-reported

8.59 +/- 73.34