nirmanpatel commited on
Commit
65feaed
·
verified ·
1 Parent(s): e46b456

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +8 -89
  2. model.pt +2 -2
  3. replay.mp4 +0 -0
  4. results.json +1 -1
README.md CHANGED
@@ -15,94 +15,13 @@ model-index:
15
  name: Pixelcopter-PLE-v0
16
  type: Pixelcopter-PLE-v0
17
  metrics:
18
- - type: mean_reward
19
- value: 58.13 +/- 55.17
20
- name: mean_reward
21
- verified: false
22
  ---
23
 
24
- # 🚁 Reinforce Agent Pixelcopter-PLE-v0
25
-
26
- A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE).
27
-
28
- ---
29
-
30
- ## 📊 Performance
31
-
32
- | Metric | Value |
33
- |--------|-------|
34
- | Mean Reward | 58.13 |
35
- | Std of Reward | ±55.17 |
36
- | Best Average Score | 80.65 (Episode 46000) |
37
- | Evaluation Episodes | 10 |
38
- | Training Episodes | 50,000 |
39
-
40
- ---
41
-
42
- ## 🧠 Algorithm — REINFORCE (Monte Carlo Policy Gradient)
43
-
44
- REINFORCE is a classic **policy gradient** method that directly optimizes the policy by:
45
- 1. Rolling out full episodes using the current policy
46
- 2. Computing discounted returns **Gₜ = rₜ₊₁ + γrₜ₊₂ + γ²rₜ₊₃ + ...** for each timestep
47
- 3. Updating the policy by maximizing **E[ log π_θ(a|s) · Gₜ ]**
48
-
49
- The policy network is a simple feedforward neural network:
50
- - **Input:** State observation vector
51
- - **Hidden layer:** Fully connected + ReLU activation
52
- - **Output:** Action probabilities via Softmax
53
-
54
- ---
55
-
56
- ## ⚙️ Hyperparameters
57
-
58
- | Parameter | Value |
59
- |-----------|-------|
60
- | Hidden layer size | 64 |
61
- | Training episodes | 50,000 |
62
- | Max steps per episode | 10,000 |
63
- | Discount factor (γ) | 0.99 |
64
- | Learning rate | 1e-4 |
65
- | Optimizer | Adam |
66
-
67
- ---
68
-
69
- ## 🎮 About the Environment
70
-
71
- **Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.
72
-
73
- - **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.)
74
- - **Action space:** 2 discrete actions — throttle up or do nothing
75
- - **Reward:** +1 for each timestep survived
76
- - **Episode ends:** On collision with a wall or the ground/ceiling
77
-
78
- ---
79
-
80
- ## 🚀 How to Use
81
-
82
- ```python
83
- from ple.games.pixelcopter import Pixelcopter
84
- from ple import PLE
85
- import torch
86
-
87
- # Load the model
88
- model = torch.load("model.pt", map_location=torch.device("cpu"))
89
- model.eval()
90
-
91
- # Run inference
92
- state, _ = env.reset()
93
- action, _ = model.act(state)
94
- ```
95
-
96
- ---
97
-
98
- ## 📚 Training Details
99
-
100
- - **Framework:** PyTorch
101
- - **Returns:** Standardized per episode for training stability
102
- - **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper
103
-
104
- ---
105
-
106
- ## 👤 Author
107
-
108
- Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).
 
15
  name: Pixelcopter-PLE-v0
16
  type: Pixelcopter-PLE-v0
17
  metrics:
18
+ - type: mean_reward
19
+ value: 38.50 +/- 39.57
20
+ name: mean_reward
21
+ verified: false
22
  ---
23
 
24
+ # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
25
+ This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
26
+ To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction
27
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0a7233d26e547dbe19a38de85294602c0a69a5efd0a350a16df8b118e2937455
3
- size 40253
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b15ba16aced601d688d0845329b4bd666ead02571b929bfcec35ee655118dc0c
3
+ size 40125
replay.mp4 CHANGED
Binary files a/replay.mp4 and b/replay.mp4 differ
 
results.json CHANGED
@@ -1 +1 @@
1
- {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 67.3, "n_evaluation_episodes": 10, "eval_datetime": "2026-04-26T18:04:03.285810"}
 
1
+ {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 38.5, "n_evaluation_episodes": 10, "eval_datetime": "2026-06-24T15:43:38.293120"}