nirmanpatel's picture
Update README.md
e46b456 verified
---
tags:
- Pixelcopter-PLE-v0
- reinforce
- reinforcement-learning
- custom-implementation
- deep-rl-class
model-index:
- name: Reinforce-PixelCopter
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: Pixelcopter-PLE-v0
type: Pixelcopter-PLE-v0
metrics:
- type: mean_reward
value: 58.13 +/- 55.17
name: mean_reward
verified: false
---
# ๐Ÿš Reinforce Agent โ€” Pixelcopter-PLE-v0
A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE).
---
## ๐Ÿ“Š Performance
| Metric | Value |
|--------|-------|
| Mean Reward | 58.13 |
| Std of Reward | ยฑ55.17 |
| Best Average Score | 80.65 (Episode 46000) |
| Evaluation Episodes | 10 |
| Training Episodes | 50,000 |
---
## ๐Ÿง  Algorithm โ€” REINFORCE (Monte Carlo Policy Gradient)
REINFORCE is a classic **policy gradient** method that directly optimizes the policy by:
1. Rolling out full episodes using the current policy
2. Computing discounted returns **Gโ‚œ = rโ‚œโ‚Šโ‚ + ฮณrโ‚œโ‚Šโ‚‚ + ฮณยฒrโ‚œโ‚Šโ‚ƒ + ...** for each timestep
3. Updating the policy by maximizing **E[ log ฯ€_ฮธ(a|s) ยท Gโ‚œ ]**
The policy network is a simple feedforward neural network:
- **Input:** State observation vector
- **Hidden layer:** Fully connected + ReLU activation
- **Output:** Action probabilities via Softmax
---
## โš™๏ธ Hyperparameters
| Parameter | Value |
|-----------|-------|
| Hidden layer size | 64 |
| Training episodes | 50,000 |
| Max steps per episode | 10,000 |
| Discount factor (ฮณ) | 0.99 |
| Learning rate | 1e-4 |
| Optimizer | Adam |
---
## ๐ŸŽฎ About the Environment
**Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.
- **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.)
- **Action space:** 2 discrete actions โ€” throttle up or do nothing
- **Reward:** +1 for each timestep survived
- **Episode ends:** On collision with a wall or the ground/ceiling
---
## ๐Ÿš€ How to Use
```python
from ple.games.pixelcopter import Pixelcopter
from ple import PLE
import torch
# Load the model
model = torch.load("model.pt", map_location=torch.device("cpu"))
model.eval()
# Run inference
state, _ = env.reset()
action, _ = model.act(state)
```
---
## ๐Ÿ“š Training Details
- **Framework:** PyTorch
- **Returns:** Standardized per episode for training stability
- **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper
---
## ๐Ÿ‘ค Author
Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).