ppo-lunarlander-v3 / README.md
sam522's picture
Upload README.md with huggingface_hub
74c66c0 verified
---
tags:
- LunarLander-v3
- ppo
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
library_name: stable-baselines3
---
# PPO Agent playing LunarLander-v3
This is a **PPO** agent trained on the **LunarLander-v3** environment.
## Usage
```python
import torch
import gymnasium as gym
from pathlib import Path
# Load the model
checkpoint = torch.load("model.pth")
network = Network(config) # You need to define the Network class
network.load_state_dict(checkpoint['model_state_dict'])
# Test the agent
env = gym.make("LunarLander-v3")
state, _ = env.reset()
done = False
total_reward = 0
while not done:
action, _, _, _ = network.get_action_and_value(state)
state, reward, terminated, truncated, _ = env.step(action)
total_reward += reward
done = terminated or truncated
print(f"Total reward: {total_reward}")
```
## Training Results
- **Environment**: LunarLander-v3
- **Training Episodes**: 3000
- **Final Performance**: 212.4 ± 113.1
- **Best Episode**: 332.4307750590245
## Algorithm Details
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Network Architecture**: Actor-Critic with shared features
- **Learning Rate**: 0.0003
- **Clip Epsilon**: 0.2
- **Training Episodes**: 3000