---
tags:
- LunarLander-v3
- ppo
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
library_name: stable-baselines3
---

# PPO Agent playing LunarLander-v3

This is a **PPO** agent trained on the **LunarLander-v3** environment.

## Usage

```python
import torch
import gymnasium as gym
from pathlib import Path

# Load the model
checkpoint = torch.load("model.pth")
network = Network(config)  # You need to define the Network class
network.load_state_dict(checkpoint['model_state_dict'])

# Test the agent
env = gym.make("LunarLander-v3")
state, _ = env.reset()
done = False
total_reward = 0

while not done:
    action, _, _, _ = network.get_action_and_value(state)
    state, reward, terminated, truncated, _ = env.step(action)
    total_reward += reward
    done = terminated or truncated

print(f"Total reward: {total_reward}")
```

## Training Results

- **Environment**: LunarLander-v3
- **Training Episodes**: 3000
- **Final Performance**: 212.4 ± 113.1
- **Best Episode**: 332.4307750590245

## Algorithm Details

- **Algorithm**: Proximal Policy Optimization (PPO)
- **Network Architecture**: Actor-Critic with shared features
- **Learning Rate**: 0.0003
- **Clip Epsilon**: 0.2
- **Training Episodes**: 3000