PPO-LunarLander / README.md
giansimone's picture
Upload folder using huggingface_hub
1fde711 verified
metadata
tags:
  - LunarLander-v3
  - reinforcement-learning
  - ppo
  - lunarlander
  - gymnasium
  - pytorch
model-index:
  - name: PPO-LunarLander
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: LunarLander-v3
          type: LunarLander-v3
        metrics:
          - type: mean_reward
            value: 284.77 +/- 15.62
            name: mean_reward
            verified: false

Proximal Policy Optimization (PPO) Agent playing LunarLander-v3

This is a trained Proximal Policy Optimization (PPO) agent for the Gymnasium LunarLander-v3 environment.

Model Details

The model was trained using the code available here.

Usage

To load and use this model for inference:

import torch
import json
import gymnasium as gym

from agent import SimpleAgent
from environment import make_env

#Load the configuration
with open("config.json", "r") as f:
    config = json.load(f)

env_id = config["env_id"]
hidden_dim = config["hidden_dim"]

# Create environment. Get action and space dimensions
env, state_size, action_size = make_env(
    env_id,
    render_mode="human",
    normalise_obs=config["normalise_obs"],
)

# Instantiate the agent and load the trained policy network
agent = SimpleAgent(state_size, action_size, hidden_dim)
agent.policy.load_state_dict(torch.load("model.pt"))

# Enjoy the agent!
state, _ = env.reset()
done = False

while not done:
    action = agent.select_action(state)
    state, reward, terminated, truncated, _ = env.step(action)

    done = terminated or truncated

    env.render()

env.close()