🚁 Reinforce Agent — Pixelcopter-PLE-v0

A policy gradient agent trained from scratch using the REINFORCE algorithm to play Pixelcopter, a challenging continuous control game built on the PyGame Learning Environment (PLE).

📊 Performance

Metric	Value
Mean Reward	58.13
Std of Reward	±55.17
Best Average Score	80.65 (Episode 46000)
Evaluation Episodes	10
Training Episodes	50,000

🧠 Algorithm — REINFORCE (Monte Carlo Policy Gradient)

REINFORCE is a classic policy gradient method that directly optimizes the policy by:

Rolling out full episodes using the current policy
Computing discounted returns Gₜ = rₜ₊₁ + γrₜ₊₂ + γ²rₜ₊₃ + ... for each timestep
Updating the policy by maximizing E[ log π_θ(a|s) · Gₜ ]

The policy network is a simple feedforward neural network:

Input: State observation vector
Hidden layer: Fully connected + ReLU activation
Output: Action probabilities via Softmax

⚙️ Hyperparameters

Parameter	Value
Hidden layer size	64
Training episodes	50,000
Max steps per episode	10,000
Discount factor (γ)	0.99
Learning rate	1e-4
Optimizer	Adam

🎮 About the Environment

Pixelcopter-PLE-v0 is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.

Observation space: 7 continuous values (player velocity, player y-position, wall positions, etc.)
Action space: 2 discrete actions — throttle up or do nothing
Reward: +1 for each timestep survived
Episode ends: On collision with a wall or the ground/ceiling

🚀 How to Use

from ple.games.pixelcopter import Pixelcopter
from ple import PLE
import torch

# Load the model
model = torch.load("model.pt", map_location=torch.device("cpu"))
model.eval()

# Run inference
state, _ = env.reset()
action, _ = model.act(state)

📚 Training Details

Framework: PyTorch
Returns: Standardized per episode for training stability
Environment API: PyGame Learning Environment (PLE) via custom Gymnasium wrapper

👤 Author

Trained by nirmanpatel as part of the Hugging Face Deep Reinforcement Learning Course.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on Pixelcopter-PLE-v0
self-reported

58.13 +/- 55.17