metadata
tags:
- LunarLander-v3
- reinforcement-learning
- ppo
- lunarlander
- gymnasium
- pytorch
model-index:
- name: PPO-LunarLander
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: LunarLander-v3
type: LunarLander-v3
metrics:
- type: mean_reward
value: 284.77 +/- 15.62
name: mean_reward
verified: false
Proximal Policy Optimization (PPO) Agent playing LunarLander-v3
This is a trained Proximal Policy Optimization (PPO) agent for the Gymnasium LunarLander-v3 environment.
Model Details
The model was trained using the code available here.
Usage
To load and use this model for inference:
import torch
import json
import gymnasium as gym
from agent import SimpleAgent
from environment import make_env
#Load the configuration
with open("config.json", "r") as f:
config = json.load(f)
env_id = config["env_id"]
hidden_dim = config["hidden_dim"]
# Create environment. Get action and space dimensions
env, state_size, action_size = make_env(
env_id,
render_mode="human",
normalise_obs=config["normalise_obs"],
)
# Instantiate the agent and load the trained policy network
agent = SimpleAgent(state_size, action_size, hidden_dim)
agent.policy.load_state_dict(torch.load("model.pt"))
# Enjoy the agent!
state, _ = env.reset()
done = False
while not done:
action = agent.select_action(state)
state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
env.render()
env.close()