PPO Agent playing LunarLander-v2

This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.

Hyperparameters used to train were optimized with Optuna.

{
  "learning_rate": 0.00038779746460731866,
  "n_steps": 2048,
  "batch_size": 128,
  "n_epochs": 13,
  "gamma": 0.9927390555180292,
  "gae_lambda": 0.9353501463066322,
  "clip_range": clip_range,
  "ent_coef": 0.007068533587811773,
  "policy_kwargs": {
    "net_arch": {'pi': [512, 512], 'vf': [512, 512]},
    "activation_fn": nn.Tanh
  },
}

Learning rate was used as an initial value for a linear scheduler during training. See this github issue for more information

Usage

import gymnasium as gym
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor

env_id = "LunarLander-v2"

model_fp = load_from_hub(
  repo_id="reeeemo/ppo-LunarLander-v2",
  filename="ppo-LunarLander-v2-optimized.zip",
)

model = PPO.load(model_fp, print_system_info=True)
eval_env = Monitor(gym.make(env_id))
mean_reward, std_reward = evaluate_policy(
  model, eval_env, n_eval_episodes=10, deterministic=True
)
print(f"Results: {mean_reward-std_reward:.2f}")
print(f"mean_reward: {mean_reward:.2f} +/- {std_reward}")
...
Downloads last month
-
Video Preview
loading

Collection including reeeemo/ppo-LunarLander-v2

Evaluation results