jetfan-xin commited on
Commit
a1b5377
ยท
verified ยท
1 Parent(s): f08c1a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -1
README.md CHANGED
@@ -32,4 +32,119 @@ tags:
32
  2. Step 1: Find your model_id: jetfan-xin/ppo-Pyramids
33
  3. Step 2: Select your *.nn /*.onnx file
34
  4. Click on Watch the agent play ๐Ÿ‘€
35
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  2. Step 1: Find your model_id: jetfan-xin/ppo-Pyramids
33
  3. Step 2: Select your *.nn /*.onnx file
34
  4. Click on Watch the agent play ๐Ÿ‘€
35
+
36
+
37
+ # ๐Ÿง  PPO Agent Trained on Unity Pyramids Environment
38
+
39
+ This repository contains a reinforcement learning agent trained using **Proximal Policy Optimization (PPO)** on Unityโ€™s **Pyramids** environment via **ML-Agents**.
40
+
41
+ ## ๐Ÿ“Œ Model Overview
42
+
43
+ - **Algorithm**: PPO with RND (Random Network Distillation)
44
+ - **Environment**: Unity Pyramids (3D sparse-reward maze)
45
+ - **Framework**: ML-Agents v1.2.0.dev0
46
+ - **Backend**: PyTorch 2.7.1 (CUDA-enabled)
47
+
48
+ The agent learns to navigate a 3D maze and reach the goal area by combining extrinsic and intrinsic rewards.
49
+
50
+ ---
51
+
52
+ ## ๐Ÿš€ How to Use This Model
53
+
54
+ You can use the `.onnx` model directly in Unity.
55
+
56
+ ### โœ… Steps:
57
+
58
+ 1. **Download the model**
59
+
60
+ Clone the repository or download `Pyramids.onnx`:
61
+
62
+ ```bash
63
+ git lfs install
64
+ git clone https://huggingface.co/jetfan-xin/ppo-Pyramids
65
+ ```
66
+
67
+ 2. **Place in Unity project**
68
+
69
+ Put the model file in your Unity project under:
70
+
71
+ ```
72
+ Assets/ML-Agents/Examples/Pyramids/Pyramids.onnx
73
+ ```
74
+
75
+ 3. **Assign in Unity Editor**
76
+
77
+ - Select your agent GameObject.
78
+ - In `Behavior Parameters`, assign `Pyramids.onnx` as the model.
79
+ - Make sure the Behavior Name matches your training config.
80
+
81
+ ---
82
+
83
+ ## โš™๏ธ Training Configuration
84
+
85
+ Key settings from `configuration.yaml`:
86
+
87
+ - `trainer_type`: `ppo`
88
+ - `max_steps`: `1000000`
89
+ - `batch_size`: `128`, `buffer_size`: `2048`
90
+ - `learning_rate`: `3e-4`
91
+ - `reward_signals`:
92
+ - `extrinsic`: ฮณ=0.99, strength=1.0
93
+ - `rnd`: ฮณ=0.99, strength=0.01
94
+ - `hidden_units`: `512`, `num_layers`: `2`
95
+ - `summary_freq`: `30000`
96
+
97
+ See `configuration.yaml` for full details.
98
+
99
+ ---
100
+
101
+ ## ๐Ÿ“ˆ Training Performance
102
+
103
+ Sample rewards from training log:
104
+
105
+ | Step | Mean Reward |
106
+ |-----------|-------------|
107
+ | 300,000 | -0.22 |
108
+ | 480,000 | 0.35 |
109
+ | 660,000 | 1.14 |
110
+ | 840,000 | 1.47 |
111
+ | 990,000 | 1.54 |
112
+
113
+ โœ… Model exported to `Pyramids.onnx` after reaching max steps.
114
+
115
+ ---
116
+
117
+ ## ๐Ÿ–ฅ๏ธ Training Setup
118
+
119
+ - **Run ID**: `PyramidsGPUTest`
120
+ - **GPU**: NVIDIA A100 80GB PCIe
121
+ - **Training time**: ~26 minutes
122
+ - **ML-Agents Envs**: v1.2.0.dev0
123
+ - **Communicator API**: v1.5.0
124
+
125
+ ---
126
+
127
+ ## ๐Ÿ“ Repository Contents
128
+
129
+ | File / Folder | Description |
130
+ |------------------------|----------------------------------------------|
131
+ | `Pyramids.onnx` | Exported trained PPO agent |
132
+ | `configuration.yaml` | Full PPO + RND training config |
133
+ | `run_logs/` | Training logs from ML-Agents |
134
+ | `Pyramids/` | Environment-specific output folder |
135
+ | `config.json` | Metadata for Hugging Face model card |
136
+
137
+ ---
138
+
139
+ ## ๐Ÿ“š Citation
140
+
141
+ If you use this model, please consider citing:
142
+
143
+ ```
144
+ @misc{ppoPyramidsJetfan,
145
+ author = {Jingfan Xin},
146
+ title = {PPO Agent Trained on Unity Pyramids Environment},
147
+ year = {2025},
148
+ howpublished = {\url{https://huggingface.co/jetfan-xin/ppo-Pyramids}},
149
+ }
150
+ ```