Reinforce Agent playing Pixelcopter-PLE-v0

This is a trained Reinforce agent playing Pixelcopter-PLE-v0. The agent was trained using a custom implementation of the Policy Gradient (Reinforce) algorithm using PyTorch.

๐ŸŽฎ Environment

  • Environment: Pixelcopter-PLE-v0 (Gym Pygame Learning Environment)
  • Goal: Navigate the copter through the tunnel without hitting walls or blocks.
  • State Space: 7
  • Action Space: 2 (Up/Accelerator, Do Nothing)

๐Ÿ“Š Evaluation Results

Metric Value
Mean Reward 44.60 +/- 32.36
Evaluation Episodes 10

โš™๏ธ Hyperparameters

The agent was trained using the following hyperparameters:

  • H_size (Hidden Neurons): 64
  • Total Training Episodes: 50,000
  • Max Steps per Episode: 10,000
  • Learning Rate: 1e-4
  • Gamma (Discount Factor): 0.99

๐Ÿง  Model Architecture

The policy uses a deeper network for this environment compared to CartPole:

  • Input: 7 (State size)
  • Layer 1: Linear(7 -> 64) + ReLU
  • Layer 2: Linear(64 -> 128) + ReLU
  • Layer 3: Linear(128 -> 2)
  • Output: Softmax

๐Ÿ Usage

To use this model, you need gym, gym_pygame, and torch installed.

โš ๏ธ Note: When loading the model with torch.load, you might get a warning about "suspicious pickle files" or "weights_only=False". This is standard for PyTorch models that save the full model structure. Please ignore the warning and trust the source if you are running this locally.

import gym
import gym_pygame
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download

# 1. Define the Policy class (MUST be present and match the training architecture)
class Policy(nn.Module):
    def __init__(self, s_size, a_size, h_size):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(s_size, h_size)
        self.fc2 = nn.Linear(h_size, h_size*2) # Note the extra layer size
        self.fc3 = nn.Linear(h_size*2, a_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return F.softmax(x, dim=1)

    def act(self, state):
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        probs = self.forward(state).cpu()
        m = torch.distributions.Categorical(probs)
        action = m.sample()
        return action.item(), m.log_prob(action)

# 2. Download the model
repo_id = "Tejas-Anvekar/Reinforce-Pixelcopter-PLE-v0" # Replace with your Repo ID
filename = "model.pt"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)

# 3. Load the model
# We set weights_only=False because we are loading the full model structure
model = torch.load(model_path, map_location=torch.device('cpu'))
model.eval()

# 4. Evaluate
env = gym.make("Pixelcopter-PLE-v0")
state = env.reset()
done = False
total_reward = 0

print("Agent playing...")
while not done:
    action, _ = model.act(state)
    state, reward, done, _ = env.step(action)
    total_reward += reward
    env.render()

print(f"Game Over! Total Reward: {total_reward}")
env.close()
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results