nirmanpatel
/

Reinforce-PixelCopter

Reinforcement Learning

Pixelcopter-PLE-v0

custom-implementation

Eval Results (legacy)

Model card Files Files and versions

Reinforce-PixelCopter / README.md

nirmanpatel's picture

Update README.md

e46b456 verified 18 days ago

|

history blame contribute delete

2.96 kB

	---
	tags:
	- Pixelcopter-PLE-v0
	- reinforce
	- reinforcement-learning
	- custom-implementation
	- deep-rl-class
	model-index:
	- name: Reinforce-PixelCopter
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: Pixelcopter-PLE-v0
	type: Pixelcopter-PLE-v0
	metrics:
	- type: mean_reward
	value: 58.13 +/- 55.17
	name: mean_reward
	verified: false
	---

	# 🚁 Reinforce Agent — Pixelcopter-PLE-v0

	A policy gradient agent trained from scratch using the REINFORCE algorithm to play [Pixelcopter](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html), a challenging continuous control game built on the PyGame Learning Environment (PLE).

	---

	## 📊 Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Mean Reward \| 58.13 \|
	\| Std of Reward \| ±55.17 \|
	\| Best Average Score \| 80.65 (Episode 46000) \|
	\| Evaluation Episodes \| 10 \|
	\| Training Episodes \| 50,000 \|

	---

	## 🧠 Algorithm — REINFORCE (Monte Carlo Policy Gradient)

	REINFORCE is a classic policy gradient method that directly optimizes the policy by:
	1. Rolling out full episodes using the current policy
	2. Computing discounted returns Gₜ = rₜ₊₁ + γrₜ₊₂ + γ²rₜ₊₃ + ... for each timestep
	3. Updating the policy by maximizing E[ log π_θ(a\|s) · Gₜ ]

	The policy network is a simple feedforward neural network:
	- Input: State observation vector
	- Hidden layer: Fully connected + ReLU activation
	- Output: Action probabilities via Softmax

	---

	## ⚙️ Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Hidden layer size \| 64 \|
	\| Training episodes \| 50,000 \|
	\| Max steps per episode \| 10,000 \|
	\| Discount factor (γ) \| 0.99 \|
	\| Learning rate \| 1e-4 \|
	\| Optimizer \| Adam \|

	---

	## 🎮 About the Environment

	Pixelcopter-PLE-v0 is a side-scrolling game where the agent controls a helicopter and must navigate through gaps in walls without crashing.

	- Observation space: 7 continuous values (player velocity, player y-position, wall positions, etc.)
	- Action space: 2 discrete actions — throttle up or do nothing
	- Reward: +1 for each timestep survived
	- Episode ends: On collision with a wall or the ground/ceiling

	---

	## 🚀 How to Use

	```python
	from ple.games.pixelcopter import Pixelcopter
	from ple import PLE
	import torch

	# Load the model
	model = torch.load("model.pt", map_location=torch.device("cpu"))
	model.eval()

	# Run inference
	state, _ = env.reset()
	action, _ = model.act(state)
	```

	---

	## 📚 Training Details

	- Framework: PyTorch
	- Returns: Standardized per episode for training stability
	- Environment API: PyGame Learning Environment (PLE) via custom Gymnasium wrapper

	---

	## 👤 Author

	Trained by nirmanpatel as part of the [Hugging Face Deep Reinforcement Learning Course](https://huggingface.co/deep-rl-course/intro/README).