| --- |
| library_name: stable-baselines3 |
| tags: |
| - LunarLander-v2 |
| - deep-reinforcement-learning |
| - reinforcement-learning |
| - stable-baselines3 |
| model-index: |
| - name: PPO |
| results: |
| - task: |
| type: reinforcement-learning |
| name: reinforcement-learning |
| dataset: |
| name: LunarLander-v2 |
| type: LunarLander-v2 |
| metrics: |
| - type: mean_reward |
| value: 275.08 +/- 17.56 |
| name: mean_reward |
| verified: false |
| --- |
| |
| # **PPO** Agent playing **LunarLander-v2** |
| This is a trained model of a **PPO** agent playing **LunarLander-v2** |
| using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). |
|
|
| ## Usage (with Stable-baselines3) |
|
|
| ```python |
| import gymnasium as gym |
| |
| from huggingface_sb3 import load_from_hub, package_to_hub |
| |
| from stable_baselines3 import PPO |
| from stable_baselines3.common.env_util import make_vec_env |
| from stable_baselines3.common.evaluation import evaluate_policy |
| from stable_baselines3.common.monitor import Monitor |
| |
| env = make_vec_env('LunarLander-v2', n_envs=16) |
| model = PPO( |
| policy = 'MlpPolicy', |
| env = env, |
| learning_rate=3e-4, |
| n_steps = 2048, # was 1024 |
| batch_size = 64, |
| n_epochs = 10, # was 4 |
| gamma = 0.99, # was 0.999 |
| gae_lambda = 0.98, |
| ent_coef = 0.01, |
| verbose=1) |
| |
| # Train it for 3,000,000 timesteps |
| model.learn(total_timesteps=3000000) |
| # Save the model |
| model_name = "ppo-LunarLander-v2" |
| model.save(model_name) |
| |
| # Create a new environment for evaluation |
| eval_env = Monitor(gym.make("LunarLander-v2", render_mode='rgb_array')) |
| |
| # Evaluate the model with 10 evaluation episodes and deterministic=True |
| mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True) |
| |
| # Print the results |
| print(f"mean_reward={mean_reward:.2f} +/- {std_reward}") |
| ``` |
|
|