--- tags: - LunarLander-v3 - ppo - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 library_name: stable-baselines3 --- # PPO Agent playing LunarLander-v3 This is a **PPO** agent trained on the **LunarLander-v3** environment. ## Usage ```python import torch import gymnasium as gym from pathlib import Path # Load the model checkpoint = torch.load("model.pth") network = Network(config) # You need to define the Network class network.load_state_dict(checkpoint['model_state_dict']) # Test the agent env = gym.make("LunarLander-v3") state, _ = env.reset() done = False total_reward = 0 while not done: action, _, _, _ = network.get_action_and_value(state) state, reward, terminated, truncated, _ = env.step(action) total_reward += reward done = terminated or truncated print(f"Total reward: {total_reward}") ``` ## Training Results - **Environment**: LunarLander-v3 - **Training Episodes**: 3000 - **Final Performance**: 212.4 ± 113.1 - **Best Episode**: 332.4307750590245 ## Algorithm Details - **Algorithm**: Proximal Policy Optimization (PPO) - **Network Architecture**: Actor-Critic with shared features - **Learning Rate**: 0.0003 - **Clip Epsilon**: 0.2 - **Training Episodes**: 3000