Instructions to use NiseRoj/ppo-LunarLander-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- stable-baselines3
How to use NiseRoj/ppo-LunarLander-v3 with stable-baselines3:
from huggingface_sb3 import load_from_hub checkpoint = load_from_hub( repo_id="NiseRoj/ppo-LunarLander-v3", filename="{MODEL FILENAME}.zip", ) - Notebooks
- Google Colab
- Kaggle
PPO Agent playing LunarLander-v3
This is a trained PPO agent that plays LunarLander-v3, built with the stable-baselines3 library as part of the Hugging Face Deep Reinforcement Learning Course.
The agent achieves a mean reward of 254.57 ± 20.32 over 10 evaluation episodes, which clears the 200 point threshold commonly used as the "solved" criterion for this environment.
The Environment
LunarLander is a classic control task in which the agent must land a spacecraft on a designated pad between two flags. The observation is an 8 dimensional continuous vector containing position, velocity, angle, angular velocity, and left/right leg ground contact flags. The action space is discrete with four options: do nothing, fire the left orientation engine, fire the main engine, or fire the right orientation engine. Firing engines costs fuel (negative reward), crashing is heavily penalized, and a soft landing on the pad is strongly rewarded.
Usage with Stable-Baselines3
import gymnasium as gym
from stable_baselines3 import PPO
from huggingface_sb3 import load_from_hub
repo_id = "NiseRoj/ppo-LunarLander-v3"
filename = "ppo-LunarLander-v3.zip"
# Download the trained model weights from the Hub
checkpoint = load_from_hub(repo_id=repo_id, filename=filename)
model = PPO.load(checkpoint)
# Roll out the policy in a rendered environment
env = gym.make("LunarLander-v3", render_mode="human")
obs, info = env.reset()
for _ in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
Evaluating the Agent
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor
from huggingface_sb3 import load_from_hub
checkpoint = load_from_hub(
repo_id="NiseRoj/ppo-LunarLander-v3",
filename="ppo-LunarLander-v3.zip",
)
model = PPO.load(checkpoint)
eval_env = Monitor(gym.make("LunarLander-v3"))
mean_reward, std_reward = evaluate_policy(
model, eval_env, n_eval_episodes=10, deterministic=True
)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward:.2f}")
Training Details
| Setting | Value |
|---|---|
| Algorithm | PPO |
| Policy | MlpPolicy |
| Environment | LunarLander-v3 |
| Parallel envs | 16 |
| Total timesteps | 1,000,000 |
| n_steps | 1024 |
| batch_size | 64 |
| n_epochs | 4 |
| gamma | 0.999 |
| gae_lambda | 0.98 |
| ent_coef | 0.01 |
(Replace any of the rows above with the actual values you used if they differ. package_to_hub does not write these for you, so the record is only as accurate as you make it.)
Results
| Metric | Value |
|---|---|
| Mean reward | 254.57 |
| Std reward | 20.32 |
| Eval episodes | 10 |
| Solved (≥200) | ✅ |
Framework Versions
This model was trained and exported with:
- stable-baselines3
- gymnasium (with the
box2dextra for LunarLander) - huggingface_sb3
See requirements.txt in this repository if you need the exact pinned versions.
- Downloads last month
- 27
Evaluation results
- mean_reward on LunarLander-v3self-reported254.57 +/- 20.32