KraTUZen's picture
Update README.md
bca8553 verified
---
tags:
- Pixelcopter-PLE-v0
- reinforce
- reinforcement-learning
- custom-implementation
- deep-rl-class
model-index:
- name: Reinforce-PixelCopter
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: Pixelcopter-PLE-v0
type: Pixelcopter-PLE-v0
metrics:
- type: mean_reward
value: 12.03
name: mean_reward
verified: false
---
# 🚁 **Reinforce Agent on Pixelcopter-PLE-v0**
This repository contains a trained **Reinforce (Policy Gradient)** agent that successfully plays the **Pixelcopter-PLE-v0** environment.
---
## πŸ“Š Model Card
**Model Name:** `Reinforce-Pixelcopter-PLE-v0`
**Environment:** `Pixelcopter-PLE-v0`
**Algorithm:** Reinforce (Monte Carlo Policy Gradient)
**Performance Metric:**
- Achieves stable flight and obstacle avoidance across evaluation runs
- Mean reward demonstrates convergence to an effective policy
---
## πŸš€ Usage
```python
from huggingface_hub import load_from_hub
import gym
# Load the trained Reinforce model
model = load_from_hub(
repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
filename="reinforce.pkl"
)
# Initialize environment
env = gym.make(model["env_id"])
```
---
## 🧠 Notes
- The agent is trained using the **Reinforce algorithm**, which updates policy parameters based on episodic returns.
- The environment is **Pixelcopter-PLE-v0**, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.
- The serialized policy is stored in `reinforce.pkl`.
---
## πŸ“‚ Repository Structure
- `reinforce.pkl` β†’ Trained policy weights
- `README.md` β†’ Documentation and usage guide
---
## βœ… Results
- The agent learns to maintain altitude and avoid collisions with obstacles.
- Demonstrates convergence to a stable policy using **policy gradient methods**.
---
## πŸ”Ž Environment Overview
- **Observation Space:** Pixel-based state representation (visual input)
- **Action Space:** Discrete (flap or no flap)
- **Objective:** Keep the helicopter flying while avoiding obstacles
- **Reward:** Positive reward for survival, penalties for collisions
---
## πŸ“š Learning Highlights
- **Algorithm:** Reinforce (Policy Gradient)
- **Update Rule:** Policy parameters updated using returns from sampled episodes
- **Strengths:** Effective for environments with discrete actions and episodic rewards
- **Limitations:** High variance in updates, mitigated with sufficient training episodes