PPO Agent playing LunarLander-v2

This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.

Usage (with Stable-baselines3)

from stable_baselines3 import ...

# Train the agent

# Create a (vectorized) environment
# eng = gym.make('LunarLander-v2')
env = make_vec_env('LunarLander-v2', n_envs=16)

# Define a PPO MlpPolicy architecture
model = PPO('MlpPolicy', env, verbose=True)

# Train it for 1,000,000 timesteps
model.learn(total_timesteps=1000000)

# Evaluate the agent on a new environment

# Create an evaluation environment
eval_env = Monitor(gym.make('LunarLander-v2'))

# Evaluate the model with 10 evaluation episodes and deterministic=True
mean_reward, std_reward = evaluate_policy(model, eval_env)

Downloads last month: -

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v2
self-reported

232.36 +/- 62.87