Upload folder using huggingface_hub
Browse files- README.md +8 -89
- model.pt +2 -2
- replay.mp4 +0 -0
- results.json +1 -1
README.md
CHANGED
|
@@ -15,94 +15,13 @@ model-index:
|
|
| 15 |
name: Pixelcopter-PLE-v0
|
| 16 |
type: Pixelcopter-PLE-v0
|
| 17 |
metrics:
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
---
|
| 23 |
|
| 24 |
-
#
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
---
|
| 29 |
-
|
| 30 |
-
## 📊 Performance
|
| 31 |
-
|
| 32 |
-
| Metric | Value |
|
| 33 |
-
|--------|-------|
|
| 34 |
-
| Mean Reward | 58.13 |
|
| 35 |
-
| Std of Reward | ±55.17 |
|
| 36 |
-
| Best Average Score | 80.65 (Episode 46000) |
|
| 37 |
-
| Evaluation Episodes | 10 |
|
| 38 |
-
| Training Episodes | 50,000 |
|
| 39 |
-
|
| 40 |
-
---
|
| 41 |
-
|
| 42 |
-
## 🧠 Algorithm — REINFORCE (Monte Carlo Policy Gradient)
|
| 43 |
-
|
| 44 |
-
REINFORCE is a classic **policy gradient** method that directly optimizes the policy by:
|
| 45 |
-
1. Rolling out full episodes using the current policy
|
| 46 |
-
2. Computing discounted returns **Gₜ = rₜ₊₁ + γrₜ₊₂ + γ²rₜ₊₃ + ...** for each timestep
|
| 47 |
-
3. Updating the policy by maximizing **E[ log π_θ(a|s) · Gₜ ]**
|
| 48 |
-
|
| 49 |
-
The policy network is a simple feedforward neural network:
|
| 50 |
-
- **Input:** State observation vector
|
| 51 |
-
- **Hidden layer:** Fully connected + ReLU activation
|
| 52 |
-
- **Output:** Action probabilities via Softmax
|
| 53 |
-
|
| 54 |
-
---
|
| 55 |
-
|
| 56 |
-
## ⚙️ Hyperparameters
|
| 57 |
-
|
| 58 |
-
| Parameter | Value |
|
| 59 |
-
|-----------|-------|
|
| 60 |
-
| Hidden layer size | 64 |
|
| 61 |
-
| Training episodes | 50,000 |
|
| 62 |
-
| Max steps per episode | 10,000 |
|
| 63 |
-
| Discount factor (γ) | 0.99 |
|
| 64 |
-
| Learning rate | 1e-4 |
|
| 65 |
-
| Optimizer | Adam |
|
| 66 |
-
|
| 67 |
-
---
|
| 68 |
-
|
| 69 |
-
## 🎮 About the Environment
|
| 70 |
-
|
| 71 |
-
**Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.
|
| 72 |
-
|
| 73 |
-
- **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.)
|
| 74 |
-
- **Action space:** 2 discrete actions — throttle up or do nothing
|
| 75 |
-
- **Reward:** +1 for each timestep survived
|
| 76 |
-
- **Episode ends:** On collision with a wall or the ground/ceiling
|
| 77 |
-
|
| 78 |
-
---
|
| 79 |
-
|
| 80 |
-
## 🚀 How to Use
|
| 81 |
-
|
| 82 |
-
```python
|
| 83 |
-
from ple.games.pixelcopter import Pixelcopter
|
| 84 |
-
from ple import PLE
|
| 85 |
-
import torch
|
| 86 |
-
|
| 87 |
-
# Load the model
|
| 88 |
-
model = torch.load("model.pt", map_location=torch.device("cpu"))
|
| 89 |
-
model.eval()
|
| 90 |
-
|
| 91 |
-
# Run inference
|
| 92 |
-
state, _ = env.reset()
|
| 93 |
-
action, _ = model.act(state)
|
| 94 |
-
```
|
| 95 |
-
|
| 96 |
-
---
|
| 97 |
-
|
| 98 |
-
## 📚 Training Details
|
| 99 |
-
|
| 100 |
-
- **Framework:** PyTorch
|
| 101 |
-
- **Returns:** Standardized per episode for training stability
|
| 102 |
-
- **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper
|
| 103 |
-
|
| 104 |
-
---
|
| 105 |
-
|
| 106 |
-
## 👤 Author
|
| 107 |
-
|
| 108 |
-
Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).
|
|
|
|
| 15 |
name: Pixelcopter-PLE-v0
|
| 16 |
type: Pixelcopter-PLE-v0
|
| 17 |
metrics:
|
| 18 |
+
- type: mean_reward
|
| 19 |
+
value: 38.50 +/- 39.57
|
| 20 |
+
name: mean_reward
|
| 21 |
+
verified: false
|
| 22 |
---
|
| 23 |
|
| 24 |
+
# **Reinforce** Agent playing **Pixelcopter-PLE-v0**
|
| 25 |
+
This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
|
| 26 |
+
To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
|
| 27 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
model.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b15ba16aced601d688d0845329b4bd666ead02571b929bfcec35ee655118dc0c
|
| 3 |
+
size 40125
|
replay.mp4
CHANGED
|
Binary files a/replay.mp4 and b/replay.mp4 differ
|
|
|
results.json
CHANGED
|
@@ -1 +1 @@
|
|
| 1 |
-
{"env_id": "Pixelcopter-PLE-v0", "mean_reward":
|
|
|
|
| 1 |
+
{"env_id": "Pixelcopter-PLE-v0", "mean_reward": 38.5, "n_evaluation_episodes": 10, "eval_datetime": "2026-06-24T15:43:38.293120"}
|