Proximal Policy Optimization (PPO) Agent playing LunarLander-v3
This is a trained Proximal Policy Optimization (PPO) agent for the Gymnasium LunarLander-v3 environment.
Model Details
The model was trained using the code available here.
Usage
To load and use this model for inference:
import torch
import json
import gymnasium as gym
from agent import SimpleAgent
from environment import make_env
#Load the configuration
with open("config.json", "r") as f:
config = json.load(f)
env_id = config["env_id"]
hidden_dim = config["hidden_dim"]
# Create environment. Get action and space dimensions
env, state_size, action_size = make_env(
env_id,
render_mode="human",
normalise_obs=config["normalise_obs"],
)
# Instantiate the agent and load the trained policy network
agent = SimpleAgent(state_size, action_size, hidden_dim)
agent.policy.load_state_dict(torch.load("model.pt"))
# Enjoy the agent!
state, _ = env.reset()
done = False
while not done:
action = agent.select_action(state)
state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
env.render()
env.close()
- Downloads last month
- 4
Evaluation results
- mean_reward on LunarLander-v3self-reported284.77 +/- 15.62