Vishand03
/

lunarlander-ppo

@@ -1,52 +1,84 @@
-# 🚀 LunarLander PPO Agent
-This repository contains a **PPO (Proximal Policy Optimization)** agent trained on the **LunarLander-v2** environment using **Stable-Baselines3**.
 ---
-## 📌 Model Details
-- **Algorithm**: PPO (Proximal Policy Optimization)
-- **Environment**: LunarLander-v2 (Box2D)
-- **Framework**: Stable-Baselines3 + Gymnasium
-- **Reward Goal**: Successfully land the Lunar Module smoothly on the landing pad 🚀
 ---
-## 🎥 Demo Video
-Watch the trained agent in action:
-<video controls autoplay loop>
-  <source src="https://huggingface.co/Vishand03/lunarlander-ppo/resolve/main/replay.mp4" type="video/mp4">
-  Your browser does not support the video tag.
-</video>
----
-## 📂 Files in this Repo
-- `replay.mp4` → Video of the trained agent
-- `README.md` → This documentation
-- (Optional) Model weights and training logs can also be pushed
----
-## 🛠️ How to Use
 ```python
 import gymnasium as gym
 from stable_baselines3 import PPO
-# Load environment
-env = gym.make("LunarLander-v2")
-# Load model
-model = PPO.load("path_to_model.zip", env=env)
-# Run agent
 obs, _ = env.reset()
-for _ in range(1000):
-    action, _ = model.predict(obs)
-    obs, reward, done, truncated, info = env.step(action)
-    env.render()
-    if done or truncated:
-        obs, _ = env.reset()

 ---
+library_name: stable-baselines3
+tags:
+- LunarLander-v2
+- deep-reinforcement-learning
+- reinforcement-learning
+- stable-baselines3
+model-index:
+- name: PPO
+  results:
+  - task:
+      type: reinforcement-learning
+      name: reinforcement-learning
+    dataset:
+      name: LunarLander-v2
+      type: LunarLander-v2
+    metrics:
+    - type: mean_reward
+      name: mean_reward
+      value: 288.92 +/- 21.79
+      verified: false
 ---
+# 🚀 PPO Agent for LunarLander-v2
+This is a trained **PPO agent** for the **LunarLander-v2** environment using Stable-Baselines3.
+## Developer
+**Vishand S (@Vishand03)**
+## Frameworks
+- Stable-Baselines3
+- PyTorch
+## Training Details
+- Algorithm: PPO
+- Timesteps: 2.5M
+- Mean Reward: ~288.9
+- Discount factor (γ): 0.99
+- Learning rate: 3e-4
+- Optimizer: Adam
+## 🎥 Demo
+![LunarLander](lunarlander.gif)
+## 🛠 Usage
 ```python
 import gymnasium as gym
 from stable_baselines3 import PPO
+from stable_baselines3.common.monitor import Monitor
+from stable_baselines3.common.evaluation import evaluate_policy
+from huggingface_hub import hf_hub_download
+# -------------------------
+# Environment Setup
+# -------------------------
+# Environment for human rendering
+env = gym.make("LunarLander-v2", render_mode="human")
+# Environment for evaluation (no render)
+eval_env = Monitor(gym.make("LunarLander-v2"))
+# -------------------------
+# Load pretrained model from Hugging Face Hub
+# -------------------------
+model_path = hf_hub_download("Vishand03/lunarlander-ppo", "model.zip")
+model = PPO.load(model_path)
+# -------------------------
+# Run a single episode
+# -------------------------
 obs, _ = env.reset()
+done = False
+while not done:
+    action, _ = model.predict(obs, deterministic=True)
+    obs, reward, terminated, truncated, _ = env.step(action)
+    done = terminated or truncated
+# -------------------------
+# Evaluate policy
+# -------------------------
+mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
+print(f"Mean Reward: {mean_reward:.2f} +/- {std_reward:.2f}")