PPO Agent playing LunarLander-v2
This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.
Usage (with Stable-baselines3)
from stable_baselines3 import ...
# Train the agent
# Create a (vectorized) environment
# eng = gym.make('LunarLander-v2')
env = make_vec_env('LunarLander-v2', n_envs=16)
# Define a PPO MlpPolicy architecture
model = PPO('MlpPolicy', env, verbose=True)
# Train it for 1,000,000 timesteps
model.learn(total_timesteps=1000000)
# Evaluate the agent on a new environment
# Create an evaluation environment
eval_env = Monitor(gym.make('LunarLander-v2'))
# Evaluate the model with 10 evaluation episodes and deterministic=True
mean_reward, std_reward = evaluate_policy(model, eval_env)
- Downloads last month
- -
Evaluation results
- mean_reward on LunarLander-v2self-reported232.36 +/- 62.87