Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +8 -89
model.pt +2 -2
replay.mp4 +0 -0
results.json +1 -1

README.md CHANGED Viewed

@@ -15,94 +15,13 @@ model-index:
       name: Pixelcopter-PLE-v0
       type: Pixelcopter-PLE-v0
     metrics:
-        - type: mean_reward
-          value: 58.13 +/- 55.17
-          name: mean_reward
-          verified: false
 ---
-# 🚁 Reinforce Agent — Pixelcopter-PLE-v0
-A policy gradient agent trained from scratch using the **REINFORCE** algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE).
----
-## 📊 Performance
-| Metric | Value |
-|--------|-------|
-| Mean Reward | 58.13 |
-| Std of Reward | ±55.17 |
-| Best Average Score | 80.65 (Episode 46000) |
-| Evaluation Episodes | 10 |
-| Training Episodes | 50,000 |
----
-## 🧠 Algorithm — REINFORCE (Monte Carlo Policy Gradient)
-REINFORCE is a classic **policy gradient** method that directly optimizes the policy by:
-1. Rolling out full episodes using the current policy
-2. Computing discounted returns **Gₜ = rₜ₊₁ + γrₜ₊₂ + γ²rₜ₊₃ + ...** for each timestep
-3. Updating the policy by maximizing **E[ log π_θ(a|s) · Gₜ ]**
-The policy network is a simple feedforward neural network:
-- **Input:** State observation vector
-- **Hidden layer:** Fully connected + ReLU activation
-- **Output:** Action probabilities via Softmax
----
-## ⚙️ Hyperparameters
-| Parameter | Value |
-|-----------|-------|
-| Hidden layer size | 64 |
-| Training episodes | 50,000 |
-| Max steps per episode | 10,000 |
-| Discount factor (γ) | 0.99 |
-| Learning rate | 1e-4 |
-| Optimizer | Adam |
----
-## 🎮 About the Environment
-**Pixelcopter-PLE-v0** is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.
-- **Observation space:** 7 continuous values (player velocity, player y-position, wall positions, etc.)
-- **Action space:** 2 discrete actions — throttle up or do nothing
-- **Reward:** +1 for each timestep survived
-- **Episode ends:** On collision with a wall or the ground/ceiling
----
-## 🚀 How to Use
-```python
-from ple.games.pixelcopter import Pixelcopter
-from ple import PLE
-import torch
-# Load the model
-model = torch.load("model.pt", map_location=torch.device("cpu"))
-model.eval()
-# Run inference
-state, _ = env.reset()
-action, _ = model.act(state)
-```
----
-## 📚 Training Details
-- **Framework:** PyTorch
-- **Returns:** Standardized per episode for training stability
-- **Environment API:** PyGame Learning Environment (PLE) via custom Gymnasium wrapper
----
-## 👤 Author
-Trained by **nirmanpatel** as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).

       name: Pixelcopter-PLE-v0
       type: Pixelcopter-PLE-v0
     metrics:
+    - type: mean_reward
+      value: 38.50 +/- 39.57
+      name: mean_reward
+      verified: false
 ---
+  # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
+  This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
+  To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction

model.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0a7233d26e547dbe19a38de85294602c0a69a5efd0a350a16df8b118e2937455
-size 40253

 version https://git-lfs.github.com/spec/v1
+oid sha256:b15ba16aced601d688d0845329b4bd666ead02571b929bfcec35ee655118dc0c
+size 40125

replay.mp4 CHANGED Viewed

Binary files a/replay.mp4 and b/replay.mp4 differ

results.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 67.3, "n_evaluation_episodes": 10, "eval_datetime": "2026-04-~~26T18~~:04:03.~~285810~~"}


1	+ {"env_id": "Pixelcopter-PLE-v0", "mean_reward": 38.5, "n_evaluation_episodes": 10, "eval_datetime": "2026-06-24T15:43:38.293120"}