KraTUZen
/

Reinforce-PixelCopter

Reinforcement Learning

Pixelcopter-PLE-v0

custom-implementation

Eval Results (legacy)

Model card Files Files and versions

KraTUZen commited on Mar 13

Commit

bca8553

·

verified ·

1 Parent(s): b0c5799

Update README.md

Files changed (1) hide show

README.md +69 -4

README.md CHANGED Viewed

@@ -21,7 +21,72 @@ model-index:
       verified: false
 ---
-  # **Reinforce** Agent playing **Pixelcopter-PLE-v0**
-  This is a trained model of a **Reinforce** agent playing **Pixelcopter-PLE-v0** .
-  To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction

       verified: false
 ---
+# 🚁 **Reinforce Agent on Pixelcopter-PLE-v0**
+This repository contains a trained **Reinforce (Policy Gradient)** agent that successfully plays the **Pixelcopter-PLE-v0** environment.
+---
+## 📊 Model Card
+**Model Name:** `Reinforce-Pixelcopter-PLE-v0`
+**Environment:** `Pixelcopter-PLE-v0`
+**Algorithm:** Reinforce (Monte Carlo Policy Gradient)
+**Performance Metric:**
+- Achieves stable flight and obstacle avoidance across evaluation runs
+- Mean reward demonstrates convergence to an effective policy
+---
+## 🚀 Usage
+```python
+from huggingface_hub import load_from_hub
+import gym
+# Load the trained Reinforce model
+model = load_from_hub(
+    repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
+    filename="reinforce.pkl"
+)
+# Initialize environment
+env = gym.make(model["env_id"])
+```
+---
+## 🧠 Notes
+- The agent is trained using the **Reinforce algorithm**, which updates policy parameters based on episodic returns.
+- The environment is **Pixelcopter-PLE-v0**, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.
+- The serialized policy is stored in `reinforce.pkl`.
+---
+## 📂 Repository Structure
+- `reinforce.pkl` → Trained policy weights
+- `README.md` → Documentation and usage guide
+---
+## ✅ Results
+- The agent learns to maintain altitude and avoid collisions with obstacles.
+- Demonstrates convergence to a stable policy using **policy gradient methods**.
+---
+## 🔎 Environment Overview
+- **Observation Space:** Pixel-based state representation (visual input)
+- **Action Space:** Discrete (flap or no flap)
+- **Objective:** Keep the helicopter flying while avoiding obstacles
+- **Reward:** Positive reward for survival, penalties for collisions
+---
+## 📚 Learning Highlights
+- **Algorithm:** Reinforce (Policy Gradient)
+- **Update Rule:** Policy parameters updated using returns from sampled episodes
+- **Strengths:** Effective for environments with discrete actions and episodic rewards
+- **Limitations:** High variance in updates, mitigated with sufficient training episodes