PPO Agent playing LunarLander-v2

This is a trained model of a PPO agent playing LunarLander-v2 using a custom implementation.

Usage

import torch
import gymnasium as gym
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions import Categorical
import numpy as np

# Define the Actor network
class Actor(nn.Module):
    def __init__(self, state_dim, action_dim, hidden_size=64):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(state_dim, hidden_size),
            nn.Tanh(),
            nn.Linear(hidden_size, hidden_size),
            nn.Tanh(),
            nn.Linear(hidden_size, action_dim)
        )
    
    def forward(self, x):
        return self.network(x)

# Load the model
checkpoint = torch.load("model.pt", map_location='cpu')
actor = Actor(state_dim=8, action_dim=4, hidden_size=checkpoint['config']['hidden_size'])
actor.load_state_dict(checkpoint['actor_state_dict'])
actor.eval()

# Test the agent
env = gym.make("LunarLander-v2")
state, _ = env.reset()
total_reward = 0

for _ in range(1000):  # Max steps
    with torch.no_grad():
        state_tensor = torch.FloatTensor(state).unsqueeze(0)
        logits = actor(state_tensor)
        action = torch.argmax(logits, dim=-1).item()
    
    state, reward, terminated, truncated, _ = env.step(action)
    total_reward += reward
    
    if terminated or truncated:
        break

print(f"Total reward: {total_reward:.2f}")

Training Results

  • Mean reward: -9.92 ± 91.50
  • Best reward: 143.33
  • Success rate: 0.0% (episodes with reward > 200)
  • Total episodes: 353
  • Total timesteps: 100,000

Algorithm Configuration

  • Algorithm: Proximal Policy Optimization (PPO)
  • Learning rate: 0.0003
  • Batch size: 2048
  • Clip coefficient: 0.2
  • Entropy coefficient: 0.01
  • Value coefficient: 0.5
  • Gamma: 0.99
  • GAE Lambda: 0.95

Training Environment

  • Environment: LunarLander-v2
  • Framework: PyTorch + Gymnasium
  • Training date: 2025-09-04

This model was trained as part of the Hugging Face Deep RL Course.

Downloads last month
7
Video Preview
loading

Evaluation results