File size: 2,239 Bytes
bfadc6c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
tags:
- ML-Agents-Pyramids
- ppo
- deep-reinforcement-learning
- reinforcement-learning
- ml-agents
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: ML-Agents-Pyramids
type: ML-Agents-Pyramids
metrics:
- type: mean_reward
value: 5.10 +/- 0.85
name: mean_reward
verified: false
---
# **PPO** Agent playing **ML-Agents-Pyramids**
This is a trained model of a **PPO** agent playing **ML-Agents-Pyramids** using Unity ML-Agents.
## Usage
```python
import torch
import numpy as np
# Load the model (you'll need the network architecture)
checkpoint = torch.load("model.pt", map_location='cpu')
# The model can be used with the Pyramids environment
# See the repository for complete usage instructions
```
## Training Results
- **Mean reward**: 5.10 ± 0.85
- **Average pyramids completed**: 5.0 per episode
- **Training episodes**: 3,000
- **Target achievement**: ✅ SUCCESS (target: 1.75)
## Algorithm Details
- **Algorithm**: Proximal Policy Optimization (PPO)
- **Environment**: ML-Agents-Pyramids
- **Task**: Multi-step pyramid completion with curiosity-driven exploration
- **Network**: Deep neural network with curiosity mechanism
- **Training Framework**: PyTorch
## Task Description
The agent learns to:
1. **Find and press buttons** to spawn pyramids
2. **Navigate to pyramids** and knock them over
3. **Collect gold bricks** from fallen pyramids
4. **Repeat efficiently** to maximize score
This complex task requires:
- Exploration in sparse reward environment
- Multi-step planning and execution
- Spatial navigation and object interaction
## Performance Milestones
- Episodes 0-500: Learning basic movement and object interaction
- Episodes 500-1500: Developing pyramid completion strategy
- Episodes 1500-3000: Optimizing efficiency and consistency
## Training Environment
- **Environment**: ML-Agents-Pyramids
- **Framework**: Custom PyTorch implementation with ML-Agents compatibility
- **Training date**: 2025-09-05
- **Course**: Hugging Face Deep RL Course Unit 5
This model was trained as part of the [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course).
|