| --- |
| tags: |
| - Pixelcopter-PLE-v0 |
| - reinforce |
| - reinforcement-learning |
| - custom-implementation |
| - deep-rl-class |
| model-index: |
| - name: Reinforce-PixelCopter |
| results: |
| - task: |
| type: reinforcement-learning |
| name: reinforcement-learning |
| dataset: |
| name: Pixelcopter-PLE-v0 |
| type: Pixelcopter-PLE-v0 |
| metrics: |
| - type: mean_reward |
| value: 58.13 +/- 55.17 |
| name: mean_reward |
| verified: false |
| --- |
| |
| # ๐ Reinforce Agent โ Pixelcopter-PLE-v0 |
|
|
| A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE). |
|
|
| --- |
|
|
| ## ๐ Performance |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Mean Reward | 58.13 | |
| | Std of Reward | ยฑ55.17 | |
| | Best Average Score | 80.65 (Episode 46000) | |
| | Evaluation Episodes | 10 | |
| | Training Episodes | 50,000 | |
|
|
| --- |
|
|
| ## ๐ง Algorithm โ REINFORCE (Monte Carlo Policy Gradient) |
|
|
| REINFORCE is a classic **policy gradient** method that directly optimizes the policy by: |
| 1. Rolling out full episodes using the current policy |
| 2. Computing discounted returns **Gโ = rโโโ + ฮณrโโโ + ฮณยฒrโโโ + ...** for each timestep |
| 3. Updating the policy by maximizing **E[ log ฯ_ฮธ(a|s) ยท Gโ ]** |
| |
| The policy network is a simple feedforward neural network: |
| - **Input:** State observation vector |
| - **Hidden layer:** Fully connected + ReLU activation |
| - **Output:** Action probabilities via Softmax |
| |
| --- |
| |
| ## โ๏ธ Hyperparameters |
| |
| | Parameter | Value | |
| |-----------|-------| |
| | Hidden layer size | 64 | |
| | Training episodes | 50,000 | |
| | Max steps per episode | 10,000 | |
| | Discount factor (ฮณ) | 0.99 | |
| | Learning rate | 1e-4 | |
| | Optimizer | Adam | |
| |
| --- |
| |
| ## ๐ฎ About the Environment |
| |
| **Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing. |
| |
| - **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.) |
| - **Action space:** 2 discrete actions โ throttle up or do nothing |
| - **Reward:** +1 for each timestep survived |
| - **Episode ends:** On collision with a wall or the ground/ceiling |
| |
| --- |
| |
| ## ๐ How to Use |
| |
| ```python |
| from ple.games.pixelcopter import Pixelcopter |
| from ple import PLE |
| import torch |
| |
| # Load the model |
| model = torch.load("model.pt", map_location=torch.device("cpu")) |
| model.eval() |
| |
| # Run inference |
| state, _ = env.reset() |
| action, _ = model.act(state) |
| ``` |
| |
| --- |
| |
| ## ๐ Training Details |
| |
| - **Framework:** PyTorch |
| - **Returns:** Standardized per episode for training stability |
| - **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper |
|
|
| --- |
|
|
| ## ๐ค Author |
|
|
| Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README). |