Reinforce Agent playing Pixelcopter-PLE-v0
This is a trained Reinforce agent playing Pixelcopter-PLE-v0. The agent was trained using a custom implementation of the Policy Gradient (Reinforce) algorithm using PyTorch.
๐ฎ Environment
- Environment:
Pixelcopter-PLE-v0(Gym Pygame Learning Environment) - Goal: Navigate the copter through the tunnel without hitting walls or blocks.
- State Space: 7
- Action Space: 2 (Up/Accelerator, Do Nothing)
๐ Evaluation Results
| Metric | Value |
|---|---|
| Mean Reward | 44.60 +/- 32.36 |
| Evaluation Episodes | 10 |
โ๏ธ Hyperparameters
The agent was trained using the following hyperparameters:
- H_size (Hidden Neurons): 64
- Total Training Episodes: 50,000
- Max Steps per Episode: 10,000
- Learning Rate: 1e-4
- Gamma (Discount Factor): 0.99
๐ง Model Architecture
The policy uses a deeper network for this environment compared to CartPole:
- Input: 7 (State size)
- Layer 1: Linear(7 -> 64) + ReLU
- Layer 2: Linear(64 -> 128) + ReLU
- Layer 3: Linear(128 -> 2)
- Output: Softmax
๐ Usage
To use this model, you need gym, gym_pygame, and torch installed.
โ ๏ธ Note: When loading the model with torch.load, you might get a warning about "suspicious pickle files" or "weights_only=False". This is standard for PyTorch models that save the full model structure. Please ignore the warning and trust the source if you are running this locally.
import gym
import gym_pygame
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from huggingface_hub import hf_hub_download
# 1. Define the Policy class (MUST be present and match the training architecture)
class Policy(nn.Module):
def __init__(self, s_size, a_size, h_size):
super(Policy, self).__init__()
self.fc1 = nn.Linear(s_size, h_size)
self.fc2 = nn.Linear(h_size, h_size*2) # Note the extra layer size
self.fc3 = nn.Linear(h_size*2, a_size)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return F.softmax(x, dim=1)
def act(self, state):
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
state = torch.from_numpy(state).float().unsqueeze(0).to(device)
probs = self.forward(state).cpu()
m = torch.distributions.Categorical(probs)
action = m.sample()
return action.item(), m.log_prob(action)
# 2. Download the model
repo_id = "Tejas-Anvekar/Reinforce-Pixelcopter-PLE-v0" # Replace with your Repo ID
filename = "model.pt"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)
# 3. Load the model
# We set weights_only=False because we are loading the full model structure
model = torch.load(model_path, map_location=torch.device('cpu'))
model.eval()
# 4. Evaluate
env = gym.make("Pixelcopter-PLE-v0")
state = env.reset()
done = False
total_reward = 0
print("Agent playing...")
while not done:
action, _ = model.act(state)
state, reward, done, _ = env.step(action)
total_reward += reward
env.render()
print(f"Game Over! Total Reward: {total_reward}")
env.close()
Evaluation results
- mean_reward on Pixelcopter-PLE-v0self-reported44.60 +/- 32.36