---
tags:
- Pixelcopter-PLE-v0
- reinforce
- reinforcement-learning
- custom-implementation
- deep-rl-class
model-index:
- name: Reinforce-PixelCopter
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: Pixelcopter-PLE-v0
      type: Pixelcopter-PLE-v0
    metrics:
    - type: mean_reward
      value: 12.03
      name: mean_reward
      verified: false
---


# 🚁 **Reinforce Agent on Pixelcopter-PLE-v0**

This repository contains a trained **Reinforce (Policy Gradient)** agent that successfully plays the **Pixelcopter-PLE-v0** environment.

---

## 📊 Model Card

**Model Name:** `Reinforce-Pixelcopter-PLE-v0`  
**Environment:** `Pixelcopter-PLE-v0`  
**Algorithm:** Reinforce (Monte Carlo Policy Gradient)  
**Performance Metric:**  
- Achieves stable flight and obstacle avoidance across evaluation runs  
- Mean reward demonstrates convergence to an effective policy  

---

## 🚀 Usage

```python
from huggingface_hub import load_from_hub
import gym

# Load the trained Reinforce model
model = load_from_hub(
    repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
    filename="reinforce.pkl"
)

# Initialize environment
env = gym.make(model["env_id"])
```

---

## 🧠 Notes
- The agent is trained using the **Reinforce algorithm**, which updates policy parameters based on episodic returns.  
- The environment is **Pixelcopter-PLE-v0**, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.  
- The serialized policy is stored in `reinforce.pkl`.  

---

## 📂 Repository Structure
- `reinforce.pkl` → Trained policy weights  
- `README.md` → Documentation and usage guide  

---

## ✅ Results
- The agent learns to maintain altitude and avoid collisions with obstacles.  
- Demonstrates convergence to a stable policy using **policy gradient methods**.  

---

## 🔎 Environment Overview
- **Observation Space:** Pixel-based state representation (visual input)  
- **Action Space:** Discrete (flap or no flap)  
- **Objective:** Keep the helicopter flying while avoiding obstacles  
- **Reward:** Positive reward for survival, penalties for collisions  

---

## 📚 Learning Highlights
- **Algorithm:** Reinforce (Policy Gradient)  
- **Update Rule:** Policy parameters updated using returns from sampled episodes  
- **Strengths:** Effective for environments with discrete actions and episodic rewards  
- **Limitations:** High variance in updates, mitigated with sufficient training episodes