--- library_name: stable-baselines3 tags: - LunarLander-v2 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: PPO results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: LunarLander-v2 type: LunarLander-v2 metrics: - type: mean_reward name: mean_reward value: 288.92 +/- 21.79 verified: false --- # 🚀 PPO Agent for LunarLander-v2 This is a trained **PPO agent** for the **LunarLander-v2** environment using Stable-Baselines3. ## Developer **Vishand S (@Vishand03)** ## Frameworks - Stable-Baselines3 - PyTorch ## Training Details - Algorithm: PPO - Timesteps: 2.5M - Mean Reward: ~288.9 - Discount factor (γ): 0.99 - Learning rate: 3e-4 - Optimizer: Adam --- ## 🎥 Demo (Preview) ![LunarLander](lunarlander.gif) --- ## 🎬 Full Demo Video 👉 [Watch the full video here](replay.mp4) --- ## 🛠 Usage ```python import gymnasium as gym from stable_baselines3 import PPO from stable_baselines3.common.monitor import Monitor from stable_baselines3.common.evaluation import evaluate_policy from huggingface_hub import hf_hub_download # ------------------------- # Environment Setup # ------------------------- env = gym.make("LunarLander-v2", render_mode="human") # Human render eval_env = Monitor(gym.make("LunarLander-v2")) # Evaluation (no render) # ------------------------- # Load pretrained model # ------------------------- model_path = hf_hub_download("Vishand03/lunarlander-ppo", "model.zip") model = PPO.load(model_path) # ------------------------- # Run one episode # ------------------------- obs, _ = env.reset() done = False while not done: action, _ = model.predict(obs, deterministic=True) obs, reward, terminated, truncated, _ = env.step(action) done = terminated or truncated # ------------------------- # Evaluate policy # ------------------------- mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True) print(f"Mean Reward: {mean_reward:.2f} +/- {std_reward:.2f}")