๐Ÿš Reinforce Agent โ€” Pixelcopter-PLE-v0

A policy gradient agent trained from scratch using the REINFORCE algorithm to play Pixelcopter, a challenging continuous control game built on the PyGame Learning Environment (PLE).


๐Ÿ“Š Performance

Metric Value
Mean Reward 58.13
Std of Reward ยฑ55.17
Best Average Score 80.65 (Episode 46000)
Evaluation Episodes 10
Training Episodes 50,000

๐Ÿง  Algorithm โ€” REINFORCE (Monte Carlo Policy Gradient)

REINFORCE is a classic policy gradient method that directly optimizes the policy by:

  1. Rolling out full episodes using the current policy
  2. Computing discounted returns Gโ‚œ = rโ‚œโ‚Šโ‚ + ฮณrโ‚œโ‚Šโ‚‚ + ฮณยฒrโ‚œโ‚Šโ‚ƒ + ... for each timestep
  3. Updating the policy by maximizing E[ log ฯ€_ฮธ(a|s) ยท Gโ‚œ ]

The policy network is a simple feedforward neural network:

  • Input: State observation vector
  • Hidden layer: Fully connected + ReLU activation
  • Output: Action probabilities via Softmax

โš™๏ธ Hyperparameters

Parameter Value
Hidden layer size 64
Training episodes 50,000
Max steps per episode 10,000
Discount factor (ฮณ) 0.99
Learning rate 1e-4
Optimizer Adam

๐ŸŽฎ About the Environment

Pixelcopter-PLE-v0 is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.

  • Observation space: 7 continuous values (player velocity, player y-position, wall positions, etc.)
  • Action space: 2 discrete actions โ€” throttle up or do nothing
  • Reward: +1 for each timestep survived
  • Episode ends: On collision with a wall or the ground/ceiling

๐Ÿš€ How to Use

from ple.games.pixelcopter import Pixelcopter
from ple import PLE
import torch

# Load the model
model = torch.load("model.pt", map_location=torch.device("cpu"))
model.eval()

# Run inference
state, _ = env.reset()
action, _ = model.act(state)

๐Ÿ“š Training Details

  • Framework: PyTorch
  • Returns: Standardized per episode for training stability
  • Environment API: PyGame Learning Environment (PLE) via custom Gymnasium wrapper

๐Ÿ‘ค Author

Trained by nirmanpatel as part of the Hugging Face Deep Reinforcement Learning Course.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results