Reinforce Agent playing Pixelcopter-PLE-v0

This is a trained Reinforce agent playing Pixelcopter-PLE-v0. The agent was trained using a custom implementation of the Policy Gradient (Reinforce) algorithm using PyTorch.

🎮 Environment

Environment: Pixelcopter-PLE-v0 (Gym Pygame Learning Environment)
Goal: Navigate the copter through the tunnel without hitting walls or blocks.
State Space: 7
Action Space: 2 (Up/Accelerator, Do Nothing)

📊 Evaluation Results

Metric	Value
Mean Reward	44.60 +/- 32.36
Evaluation Episodes	10

⚙️ Hyperparameters

The agent was trained using the following hyperparameters:

H_size (Hidden Neurons): 64
Total Training Episodes: 50,000
Max Steps per Episode: 10,000
Learning Rate: 1e-4
Gamma (Discount Factor): 0.99

🧠 Model Architecture

The policy uses a deeper network for this environment compared to CartPole:

Input: 7 (State size)
Layer 1: Linear(7 -> 64) + ReLU
Layer 2: Linear(64 -> 128) + ReLU
Layer 3: Linear(128 -> 2)
Output: Softmax

🐍 Usage

To use this model, you need gym, gym_pygame, and torch installed.

⚠️ Note: When loading the model with torch.load, you might get a warning about "suspicious pickle files" or "weights_only=False". This is standard for PyTorch models that save the full model structure. Please ignore the warning and trust the source if you are running this locally.

import gym
import gym_pygame
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download

# 1. Define the Policy class (MUST be present and match the training architecture)
class Policy(nn.Module):
    def __init__(self, s_size, a_size, h_size):
        super(Policy, self).__init__()
        self.fc1 = nn.Linear(s_size, h_size)
        self.fc2 = nn.Linear(h_size, h_size*2) # Note the extra layer size
        self.fc3 = nn.Linear(h_size*2, a_size)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return F.softmax(x, dim=1)

    def act(self, state):
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        state = torch.from_numpy(state).float().unsqueeze(0).to(device)
        probs = self.forward(state).cpu()
        m = torch.distributions.Categorical(probs)
        action = m.sample()
        return action.item(), m.log_prob(action)

# 2. Download the model
repo_id = "Tejas-Anvekar/Reinforce-Pixelcopter-PLE-v0" # Replace with your Repo ID
filename = "model.pt"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)

# 3. Load the model
# We set weights_only=False because we are loading the full model structure
model = torch.load(model_path, map_location=torch.device('cpu'))
model.eval()

# 4. Evaluate
env = gym.make("Pixelcopter-PLE-v0")
state = env.reset()
done = False
total_reward = 0

print("Agent playing...")
while not done:
    action, _ = model.act(state)
    state, reward, done, _ = env.step(action)
    total_reward += reward
    env.render()

print(f"Game Over! Total Reward: {total_reward}")
env.close()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on Pixelcopter-PLE-v0
self-reported

44.60 +/- 32.36