jetfan-xin
/

ppo-Pyramids

@@ -32,4 +32,119 @@ tags:
   2. Step 1: Find your model_id: jetfan-xin/ppo-Pyramids
   3. Step 2: Select your *.nn /*.onnx file
   4. Click on Watch the agent play 👀

   2. Step 1: Find your model_id: jetfan-xin/ppo-Pyramids
   3. Step 2: Select your *.nn /*.onnx file
   4. Click on Watch the agent play 👀
+# 🧠 PPO Agent Trained on Unity Pyramids Environment
+This repository contains a reinforcement learning agent trained using **Proximal Policy Optimization (PPO)** on Unity’s **Pyramids** environment via **ML-Agents**.
+## 📌 Model Overview
+- **Algorithm**: PPO with RND (Random Network Distillation)
+- **Environment**: Unity Pyramids (3D sparse-reward maze)
+- **Framework**: ML-Agents v1.2.0.dev0
+- **Backend**: PyTorch 2.7.1 (CUDA-enabled)
+The agent learns to navigate a 3D maze and reach the goal area by combining extrinsic and intrinsic rewards.
+---
+## 🚀 How to Use This Model
+You can use the `.onnx` model directly in Unity.
+### ✅ Steps:
+1. **Download the model**
+   Clone the repository or download `Pyramids.onnx`:
+   ```bash
+   git lfs install
+   git clone https://huggingface.co/jetfan-xin/ppo-Pyramids
+   ```
+2. **Place in Unity project**
+   Put the model file in your Unity project under:
+   ```
+   Assets/ML-Agents/Examples/Pyramids/Pyramids.onnx
+   ```
+3. **Assign in Unity Editor**
+   - Select your agent GameObject.
+   - In `Behavior Parameters`, assign `Pyramids.onnx` as the model.
+   - Make sure the Behavior Name matches your training config.
+---
+## ⚙️ Training Configuration
+Key settings from `configuration.yaml`:
+- `trainer_type`: `ppo`
+- `max_steps`: `1000000`
+- `batch_size`: `128`, `buffer_size`: `2048`
+- `learning_rate`: `3e-4`
+- `reward_signals`:
+  - `extrinsic`: γ=0.99, strength=1.0
+  - `rnd`: γ=0.99, strength=0.01
+- `hidden_units`: `512`, `num_layers`: `2`
+- `summary_freq`: `30000`
+See `configuration.yaml` for full details.
+---
+## 📈 Training Performance
+Sample rewards from training log:
+| Step      | Mean Reward |
+|-----------|-------------|
+| 300,000   | -0.22       |
+| 480,000   |  0.35       |
+| 660,000   |  1.14       |
+| 840,000   |  1.47       |
+| 990,000   |  1.54       |
+✅ Model exported to `Pyramids.onnx` after reaching max steps.
+---
+## 🖥️ Training Setup
+- **Run ID**: `PyramidsGPUTest`
+- **GPU**: NVIDIA A100 80GB PCIe
+- **Training time**: ~26 minutes
+- **ML-Agents Envs**: v1.2.0.dev0
+- **Communicator API**: v1.5.0
+---
+## 📁 Repository Contents
+| File / Folder         | Description                                  |
+|------------------------|----------------------------------------------|
+| `Pyramids.onnx`        | Exported trained PPO agent                  |
+| `configuration.yaml`   | Full PPO + RND training config              |
+| `run_logs/`            | Training logs from ML-Agents                |
+| `Pyramids/`            | Environment-specific output folder          |
+| `config.json`          | Metadata for Hugging Face model card        |
+---
+## 📚 Citation
+If you use this model, please consider citing:
+```
+@misc{ppoPyramidsJetfan,
+  author = {Jingfan Xin},
+  title = {PPO Agent Trained on Unity Pyramids Environment},
+  year = {2025},
+  howpublished = {\url{https://huggingface.co/jetfan-xin/ppo-Pyramids}},
+}
+```