|
|
--- |
|
|
tags: |
|
|
- LunarLander-v3 |
|
|
- ppo |
|
|
- deep-reinforcement-learning |
|
|
- reinforcement-learning |
|
|
- stable-baselines3 |
|
|
library_name: stable-baselines3 |
|
|
--- |
|
|
|
|
|
# PPO Agent playing LunarLander-v3 |
|
|
|
|
|
This is a **PPO** agent trained on the **LunarLander-v3** environment. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import gymnasium as gym |
|
|
from pathlib import Path |
|
|
|
|
|
# Load the model |
|
|
checkpoint = torch.load("model.pth") |
|
|
network = Network(config) # You need to define the Network class |
|
|
network.load_state_dict(checkpoint['model_state_dict']) |
|
|
|
|
|
# Test the agent |
|
|
env = gym.make("LunarLander-v3") |
|
|
state, _ = env.reset() |
|
|
done = False |
|
|
total_reward = 0 |
|
|
|
|
|
while not done: |
|
|
action, _, _, _ = network.get_action_and_value(state) |
|
|
state, reward, terminated, truncated, _ = env.step(action) |
|
|
total_reward += reward |
|
|
done = terminated or truncated |
|
|
|
|
|
print(f"Total reward: {total_reward}") |
|
|
``` |
|
|
|
|
|
## Training Results |
|
|
|
|
|
- **Environment**: LunarLander-v3 |
|
|
- **Training Episodes**: 3000 |
|
|
- **Final Performance**: 212.4 ± 113.1 |
|
|
- **Best Episode**: 332.4307750590245 |
|
|
|
|
|
## Algorithm Details |
|
|
|
|
|
- **Algorithm**: Proximal Policy Optimization (PPO) |
|
|
- **Network Architecture**: Actor-Critic with shared features |
|
|
- **Learning Rate**: 0.0003 |
|
|
- **Clip Epsilon**: 0.2 |
|
|
- **Training Episodes**: 3000 |
|
|
|
|
|
|