---
tags:
- Pixelcopter-PLE-v0
- reinforce
- reinforcement-learning
- custom-implementation
- deep-rl-class
model-index:
- name: Reinforce-PixelCopter
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: Pixelcopter-PLE-v0
      type: Pixelcopter-PLE-v0
    metrics:
        - type: mean_reward
          value: 58.13 +/- 55.17
          name: mean_reward
          verified: false
---

# 🚁 Reinforce Agent — Pixelcopter-PLE-v0

A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE).

---

## 📊 Performance

| Metric | Value |
|--------|-------|
| Mean Reward | 58.13 |
| Std of Reward | ±55.17 |
| Best Average Score | 80.65 (Episode 46000) |
| Evaluation Episodes | 10 |
| Training Episodes | 50,000 |

---

## 🧠 Algorithm — REINFORCE (Monte Carlo Policy Gradient)

REINFORCE is a classic **policy gradient** method that directly optimizes the policy by:
1. Rolling out full episodes using the current policy
2. Computing discounted returns **Gₜ = rₜ₊₁ + γrₜ₊₂ + γ²rₜ₊₃ + ...** for each timestep  
3. Updating the policy by maximizing **E[ log π_θ(a|s) · Gₜ ]**

The policy network is a simple feedforward neural network:
- **Input:** State observation vector
- **Hidden layer:** Fully connected + ReLU activation
- **Output:** Action probabilities via Softmax

---

## ⚙️ Hyperparameters

| Parameter | Value |
|-----------|-------|
| Hidden layer size | 64 |
| Training episodes | 50,000 |
| Max steps per episode | 10,000 |
| Discount factor (γ) | 0.99 |
| Learning rate | 1e-4 |
| Optimizer | Adam |

---

## 🎮 About the Environment

**Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing. 

- **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.)
- **Action space:** 2 discrete actions — throttle up or do nothing
- **Reward:** +1 for each timestep survived
- **Episode ends:** On collision with a wall or the ground/ceiling

---

## 🚀 How to Use

```python
from ple.games.pixelcopter import Pixelcopter
from ple import PLE
import torch

# Load the model
model = torch.load("model.pt", map_location=torch.device("cpu"))
model.eval()

# Run inference
state, _ = env.reset()
action, _ = model.act(state)
```

---

## 📚 Training Details

- **Framework:** PyTorch
- **Returns:** Standardized per episode for training stability
- **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper

---

## 👤 Author

Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).