sam522
/

ppo-lunarlander-v3

+---
+tags:
+- LunarLander-v3
+- ppo
+- deep-reinforcement-learning
+- reinforcement-learning
+- stable-baselines3
+library_name: stable-baselines3
+---
+# PPO Agent playing LunarLander-v3
+This is a **PPO** agent trained on the **LunarLander-v3** environment.
+## Usage
+```python
+import torch
+import gymnasium as gym
+from pathlib import Path
+# Load the model
+checkpoint = torch.load("model.pth")
+network = Network(config)  # You need to define the Network class
+network.load_state_dict(checkpoint['model_state_dict'])
+# Test the agent
+env = gym.make("LunarLander-v3")
+state, _ = env.reset()
+done = False
+total_reward = 0
+while not done:
+    action, _, _, _ = network.get_action_and_value(state)
+    state, reward, terminated, truncated, _ = env.step(action)
+    total_reward += reward
+    done = terminated or truncated
+print(f"Total reward: {total_reward}")
+```
+## Training Results
+- **Environment**: LunarLander-v3
+- **Training Episodes**: 3000
+- **Final Performance**: 212.4 ± 113.1
+- **Best Episode**: 332.4307750590245
+## Algorithm Details
+- **Algorithm**: Proximal Policy Optimization (PPO)
+- **Network Architecture**: Actor-Critic with shared features
+- **Learning Rate**: 0.0003
+- **Clip Epsilon**: 0.2
+- **Training Episodes**: 3000