ppo-Pyramids / README.md

Upload Pyramids PPO model for Deep RL Course Unit 5

bfadc6c verified 4 months ago

2.24 kB

	---
	tags:
	- ML-Agents-Pyramids
	- ppo
	- deep-reinforcement-learning
	- reinforcement-learning
	- ml-agents
	model-index:
	- name: PPO
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: ML-Agents-Pyramids
	type: ML-Agents-Pyramids
	metrics:
	- type: mean_reward
	value: 5.10 +/- 0.85
	name: mean_reward
	verified: false
	---

	# PPO Agent playing ML-Agents-Pyramids

	This is a trained model of a PPO agent playing ML-Agents-Pyramids using Unity ML-Agents.

	## Usage

	```python
	import torch
	import numpy as np

	# Load the model (you'll need the network architecture)
	checkpoint = torch.load("model.pt", map_location='cpu')

	# The model can be used with the Pyramids environment
	# See the repository for complete usage instructions
	```

	## Training Results

	- Mean reward: 5.10 ± 0.85
	- Average pyramids completed: 5.0 per episode
	- Training episodes: 3,000
	- Target achievement: ✅ SUCCESS (target: 1.75)

	## Algorithm Details

	- Algorithm: Proximal Policy Optimization (PPO)
	- Environment: ML-Agents-Pyramids
	- Task: Multi-step pyramid completion with curiosity-driven exploration
	- Network: Deep neural network with curiosity mechanism
	- Training Framework: PyTorch

	## Task Description

	The agent learns to:

	1. Find and press buttons to spawn pyramids
	2. Navigate to pyramids and knock them over
	3. Collect gold bricks from fallen pyramids
	4. Repeat efficiently to maximize score

	This complex task requires:
	- Exploration in sparse reward environment
	- Multi-step planning and execution
	- Spatial navigation and object interaction

	## Performance Milestones

	- Episodes 0-500: Learning basic movement and object interaction
	- Episodes 500-1500: Developing pyramid completion strategy
	- Episodes 1500-3000: Optimizing efficiency and consistency

	## Training Environment

	- Environment: ML-Agents-Pyramids
	- Framework: Custom PyTorch implementation with ML-Agents compatibility
	- Training date: 2025-09-05
	- Course: Hugging Face Deep RL Course Unit 5

	This model was trained as part of the [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course).