Proximal Policy Optimization (PPO) Agent playing LunarLander-v3

This is a trained Proximal Policy Optimization (PPO) agent for the Gymnasium LunarLander-v3 environment.

Model Details

The model was trained using the code available here.

Usage

To load and use this model for inference:

import torch
import json
import gymnasium as gym

from agent import SimpleAgent
from environment import make_env

#Load the configuration
with open("config.json", "r") as f:
    config = json.load(f)

env_id = config["env_id"]
hidden_dim = config["hidden_dim"]

# Create environment. Get action and space dimensions
env, state_size, action_size = make_env(
    env_id,
    render_mode="human",
    normalise_obs=config["normalise_obs"],
)

# Instantiate the agent and load the trained policy network
agent = SimpleAgent(state_size, action_size, hidden_dim)
agent.policy.load_state_dict(torch.load("model.pt"))

# Enjoy the agent!
state, _ = env.reset()
done = False

while not done:
    action = agent.select_action(state)
    state, reward, terminated, truncated, _ = env.step(action)

    done = terminated or truncated

    env.render()

env.close()

Downloads last month: 6

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v3
self-reported

284.77 +/- 15.62