|
|
--- |
|
|
tags: |
|
|
- ML-Agents-Pyramids |
|
|
- ppo |
|
|
- deep-reinforcement-learning |
|
|
- reinforcement-learning |
|
|
- ml-agents |
|
|
model-index: |
|
|
- name: PPO |
|
|
results: |
|
|
- task: |
|
|
type: reinforcement-learning |
|
|
name: reinforcement-learning |
|
|
dataset: |
|
|
name: ML-Agents-Pyramids |
|
|
type: ML-Agents-Pyramids |
|
|
metrics: |
|
|
- type: mean_reward |
|
|
value: 5.10 +/- 0.85 |
|
|
name: mean_reward |
|
|
verified: false |
|
|
--- |
|
|
|
|
|
# **PPO** Agent playing **ML-Agents-Pyramids** |
|
|
|
|
|
This is a trained model of a **PPO** agent playing **ML-Agents-Pyramids** using Unity ML-Agents. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import numpy as np |
|
|
|
|
|
# Load the model (you'll need the network architecture) |
|
|
checkpoint = torch.load("model.pt", map_location='cpu') |
|
|
|
|
|
# The model can be used with the Pyramids environment |
|
|
# See the repository for complete usage instructions |
|
|
``` |
|
|
|
|
|
## Training Results |
|
|
|
|
|
- **Mean reward**: 5.10 ± 0.85 |
|
|
- **Average pyramids completed**: 5.0 per episode |
|
|
- **Training episodes**: 3,000 |
|
|
- **Target achievement**: ✅ SUCCESS (target: 1.75) |
|
|
|
|
|
## Algorithm Details |
|
|
|
|
|
- **Algorithm**: Proximal Policy Optimization (PPO) |
|
|
- **Environment**: ML-Agents-Pyramids |
|
|
- **Task**: Multi-step pyramid completion with curiosity-driven exploration |
|
|
- **Network**: Deep neural network with curiosity mechanism |
|
|
- **Training Framework**: PyTorch |
|
|
|
|
|
## Task Description |
|
|
|
|
|
The agent learns to: |
|
|
|
|
|
1. **Find and press buttons** to spawn pyramids |
|
|
2. **Navigate to pyramids** and knock them over |
|
|
3. **Collect gold bricks** from fallen pyramids |
|
|
4. **Repeat efficiently** to maximize score |
|
|
|
|
|
This complex task requires: |
|
|
- Exploration in sparse reward environment |
|
|
- Multi-step planning and execution |
|
|
- Spatial navigation and object interaction |
|
|
|
|
|
## Performance Milestones |
|
|
|
|
|
- Episodes 0-500: Learning basic movement and object interaction |
|
|
- Episodes 500-1500: Developing pyramid completion strategy |
|
|
- Episodes 1500-3000: Optimizing efficiency and consistency |
|
|
|
|
|
## Training Environment |
|
|
|
|
|
- **Environment**: ML-Agents-Pyramids |
|
|
- **Framework**: Custom PyTorch implementation with ML-Agents compatibility |
|
|
- **Training date**: 2025-09-05 |
|
|
- **Course**: Hugging Face Deep RL Course Unit 5 |
|
|
|
|
|
This model was trained as part of the [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course). |
|
|
|