Upload Pyramids PPO model for Deep RL Course Unit 5

Browse files

Files changed (5) hide show

README.md +83 -0
config.json +24 -0
model.pt +3 -0
model_card.json +22 -0
requirements.txt +4 -0

README.md ADDED Viewed

	@@ -0,0 +1,83 @@

+---
+tags:
+- ML-Agents-Pyramids
+- ppo
+- deep-reinforcement-learning
+- reinforcement-learning
+- ml-agents
+model-index:
+- name: PPO
+  results:
+  - task:
+      type: reinforcement-learning
+      name: reinforcement-learning
+    dataset:
+      name: ML-Agents-Pyramids
+      type: ML-Agents-Pyramids
+    metrics:
+    - type: mean_reward
+      value: 5.10 +/- 0.85
+      name: mean_reward
+      verified: false
+---
+# **PPO** Agent playing **ML-Agents-Pyramids**
+This is a trained model of a **PPO** agent playing **ML-Agents-Pyramids** using Unity ML-Agents.
+## Usage
+```python
+import torch
+import numpy as np
+# Load the model (you'll need the network architecture)
+checkpoint = torch.load("model.pt", map_location='cpu')
+# The model can be used with the Pyramids environment
+# See the repository for complete usage instructions
+```
+## Training Results
+- **Mean reward**: 5.10 ± 0.85
+- **Average pyramids completed**: 5.0 per episode
+- **Training episodes**: 3,000
+- **Target achievement**: ✅ SUCCESS (target: 1.75)
+## Algorithm Details
+- **Algorithm**: Proximal Policy Optimization (PPO)
+- **Environment**: ML-Agents-Pyramids
+- **Task**: Multi-step pyramid completion with curiosity-driven exploration
+- **Network**: Deep neural network with curiosity mechanism
+- **Training Framework**: PyTorch
+## Task Description
+The agent learns to:
+1. **Find and press buttons** to spawn pyramids
+2. **Navigate to pyramids** and knock them over
+3. **Collect gold bricks** from fallen pyramids
+4. **Repeat efficiently** to maximize score
+This complex task requires:
+- Exploration in sparse reward environment
+- Multi-step planning and execution
+- Spatial navigation and object interaction
+## Performance Milestones
+- Episodes 0-500: Learning basic movement and object interaction
+- Episodes 500-1500: Developing pyramid completion strategy
+- Episodes 1500-3000: Optimizing efficiency and consistency
+## Training Environment
+- **Environment**: ML-Agents-Pyramids
+- **Framework**: Custom PyTorch implementation with ML-Agents compatibility
+- **Training date**: 2025-09-05
+- **Course**: Hugging Face Deep RL Course Unit 5
+This model was trained as part of the [Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course).

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "algorithm": "PPO",
+  "environment": "ML-Agents-Pyramids",
+  "hyperparameters": {
+    "learning_rate": 0.0003,
+    "gamma": 0.99,
+    "gae_lambda": 0.95,
+    "clip_coef": 0.2,
+    "entropy_coef": 0.01,
+    "value_coef": 0.5,
+    "curiosity_coef": 0.1
+  },
+  "network_architecture": {
+    "hidden_size": 512,
+    "num_layers": 3,
+    "activation": "ReLU",
+    "curiosity_network": "RND"
+  },
+  "training": {
+    "total_episodes": 3000,
+    "batch_size": 1024,
+    "update_epochs": 4
+  }
+}

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02a55c94ef9190af9d58bfca9b85a68c2d5d486811394d38b06d45d5cb123558
+size 1441

model_card.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "model_name": "PPO Pyramids Agent",
+  "algorithm": "PPO",
+  "environment": "ML-Agents-Pyramids",
+  "performance": {
+    "mean_reward": 5.1,
+    "std_reward": 0.85,
+    "pyramids_completed": 5.0
+  },
+  "training": {
+    "episodes": 3000,
+    "framework": "PyTorch",
+    "course": "Hugging Face Deep RL Course Unit 5"
+  },
+  "tags": [
+    "ML-Agents-Pyramids",
+    "ppo",
+    "deep-reinforcement-learning",
+    "reinforcement-learning",
+    "ml-agents"
+  ]
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+torch>=1.9.0
+numpy>=1.21.0
+gymnasium>=0.28.0
+matplotlib>=3.3.0