File size: 2,132 Bytes
8d82a27 311d5f4 8d82a27 ffeda72 311d5f4 ffeda72 311d5f4 ffeda72 311d5f4 152db85 311d5f4 ffeda72 311d5f4 ffeda72 f9dcddc 311d5f4 23b2774 f9dcddc 311d5f4 8051f91 311d5f4 8051f91 311d5f4 f9dcddc 38d579c 311d5f4 f9dcddc 311d5f4 f9dcddc 311d5f4 8051f91 311d5f4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
library_name: stable-baselines3
tags:
- LunarLander-v2
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v2
type: LunarLander-v2
metrics:
- type: mean_reward
name: mean_reward
value: 288.92 +/- 21.79
verified: false
---
# 🚀 PPO Agent for LunarLander-v2
This is a trained **PPO agent** for the **LunarLander-v2** environment using Stable-Baselines3.
## Developer
**Vishand S (@Vishand03)**
## Frameworks
- Stable-Baselines3
- PyTorch
## Training Details
- Algorithm: PPO
- Timesteps: 2.5M
- Mean Reward: ~288.9
- Discount factor (γ): 0.99
- Learning rate: 3e-4
- Optimizer: Adam
---
## 🎥 Demo (Preview)

---
## 🎬 Full Demo Video
👉 [Watch the full video here](replay.mp4)
---
## 🛠 Usage
```python
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.evaluation import evaluate_policy
from huggingface_hub import hf_hub_download
# -------------------------
# Environment Setup
# -------------------------
env = gym.make("LunarLander-v2", render_mode="human") # Human render
eval_env = Monitor(gym.make("LunarLander-v2")) # Evaluation (no render)
# -------------------------
# Load pretrained model
# -------------------------
model_path = hf_hub_download("Vishand03/lunarlander-ppo", "model.zip")
model = PPO.load(model_path)
# -------------------------
# Run one episode
# -------------------------
obs, _ = env.reset()
done = False
while not done:
action, _ = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
# -------------------------
# Evaluate policy
# -------------------------
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"Mean Reward: {mean_reward:.2f} +/- {std_reward:.2f}")
|