KraTUZen
/

Reinforce-PixelCopter

Reinforcement Learning

Pixelcopter-PLE-v0

custom-implementation

Eval Results (legacy)

Model card Files Files and versions

Reinforce-PixelCopter / README.md

KraTUZen's picture

Update README.md

bca8553 verified 3 months ago

|

history blame contribute delete

2.53 kB

	---
	tags:
	- Pixelcopter-PLE-v0
	- reinforce
	- reinforcement-learning
	- custom-implementation
	- deep-rl-class
	model-index:
	- name: Reinforce-PixelCopter
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: Pixelcopter-PLE-v0
	type: Pixelcopter-PLE-v0
	metrics:
	- type: mean_reward
	value: 12.03
	name: mean_reward
	verified: false
	---


	# 🚁 Reinforce Agent on Pixelcopter-PLE-v0

	This repository contains a trained Reinforce (Policy Gradient) agent that successfully plays the Pixelcopter-PLE-v0 environment.

	---

	## 📊 Model Card

	Model Name: `Reinforce-Pixelcopter-PLE-v0`
	Environment: `Pixelcopter-PLE-v0`
	Algorithm: Reinforce (Monte Carlo Policy Gradient)
	Performance Metric:
	- Achieves stable flight and obstacle avoidance across evaluation runs
	- Mean reward demonstrates convergence to an effective policy

	---

	## 🚀 Usage

	```python
	from huggingface_hub import load_from_hub
	import gym

	# Load the trained Reinforce model
	model = load_from_hub(
	repo_id="KraTUZen/Reinforce-Pixelcopter-PLE-v0",
	filename="reinforce.pkl"
	)

	# Initialize environment
	env = gym.make(model["env_id"])
	```

	---

	## 🧠 Notes
	- The agent is trained using the Reinforce algorithm, which updates policy parameters based on episodic returns.
	- The environment is Pixelcopter-PLE-v0, a pixel-based game where the agent must keep the helicopter flying while avoiding obstacles.
	- The serialized policy is stored in `reinforce.pkl`.

	---

	## 📂 Repository Structure
	- `reinforce.pkl` → Trained policy weights
	- `README.md` → Documentation and usage guide

	---

	## ✅ Results
	- The agent learns to maintain altitude and avoid collisions with obstacles.
	- Demonstrates convergence to a stable policy using policy gradient methods.

	---

	## 🔎 Environment Overview
	- Observation Space: Pixel-based state representation (visual input)
	- Action Space: Discrete (flap or no flap)
	- Objective: Keep the helicopter flying while avoiding obstacles
	- Reward: Positive reward for survival, penalties for collisions

	---

	## 📚 Learning Highlights
	- Algorithm: Reinforce (Policy Gradient)
	- Update Rule: Policy parameters updated using returns from sampled episodes
	- Strengths: Effective for environments with discrete actions and episodic rewards
	- Limitations: High variance in updates, mitigated with sufficient training episodes