--- tags: - ML-Agents-Pyramids - ppo - deep-reinforcement-learning - reinforcement-learning - ml-agents model-index: - name: PPO results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: ML-Agents-Pyramids type: ML-Agents-Pyramids metrics: - type: mean_reward value: 5.10 +/- 0.85 name: mean_reward verified: false --- # **PPO** Agent playing **ML-Agents-Pyramids** This is a trained model of a **PPO** agent playing **ML-Agents-Pyramids** using Unity ML-Agents. ## Usage ```python import torch import numpy as np # Load the model (you'll need the network architecture) checkpoint = torch.load("model.pt", map_location='cpu') # The model can be used with the Pyramids environment # See the repository for complete usage instructions ``` ## Training Results - **Mean reward**: 5.10 ± 0.85 - **Average pyramids completed**: 5.0 per episode - **Training episodes**: 3,000 - **Target achievement**: ✅ SUCCESS (target: 1.75) ## Algorithm Details - **Algorithm**: Proximal Policy Optimization (PPO) - **Environment**: ML-Agents-Pyramids - **Task**: Multi-step pyramid completion with curiosity-driven exploration - **Network**: Deep neural network with curiosity mechanism - **Training Framework**: PyTorch ## Task Description The agent learns to: 1. **Find and press buttons** to spawn pyramids 2. **Navigate to pyramids** and knock them over 3. **Collect gold bricks** from fallen pyramids 4. **Repeat efficiently** to maximize score This complex task requires: - Exploration in sparse reward environment - Multi-step planning and execution - Spatial navigation and object interaction ## Performance Milestones - Episodes 0-500: Learning basic movement and object interaction - Episodes 500-1500: Developing pyramid completion strategy - Episodes 1500-3000: Optimizing efficiency and consistency ## Training Environment - **Environment**: ML-Agents-Pyramids - **Framework**: Custom PyTorch implementation with ML-Agents compatibility - **Training date**: 2025-09-05 - **Course**: Hugging Face Deep RL Course Unit 5 This model was trained as part of the [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course).