Update README.md
Browse files
README.md
CHANGED
|
@@ -110,6 +110,131 @@ Sample rewards from training log:
|
|
| 110 |
| 840,000 | 1.47 |
|
| 111 |
| 990,000 | 1.54 |
|
| 112 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
β
Model exported to `Pyramids.onnx` after reaching max steps.
|
| 114 |
|
| 115 |
---
|
|
|
|
| 110 |
| 840,000 | 1.47 |
|
| 111 |
| 990,000 | 1.54 |
|
| 112 |
|
| 113 |
+
details:
|
| 114 |
+
```
|
| 115 |
+
(rl_py310) 4xin@ltgpu3:~/deep_rl/unit5/ml-agents$ CUDA_VISIBLE_DEVICES=3 mlagents-learn ./config/ppo/PyramidsRND.yaml \
|
| 116 |
+
--env=./training-envs-executables/linux/Pyramids/Pyramids.x86_64 \
|
| 117 |
+
--run-id="PyramidsGPUTest" \
|
| 118 |
+
--no-graphics
|
| 119 |
+
|
| 120 |
+
β β
|
| 121 |
+
βββ¬ββ‘ βββ¬ββ
|
| 122 |
+
βββ¬ββββββ β¬ββββββ¬β
|
| 123 |
+
ββ¬ββββββ¬β ββ¬βββββββ βββ
|
| 124 |
+
β¬β¬β¬β¬ββββ¦β ββ¬ββββ£β£β£β¬ ββ£β£β¬ ββ£β£β£ βββ ββ£β£
|
| 125 |
+
β¬β¬β¬β¬β¬β¬β¬β¬βββ¬ββββ¬βͺβββ£β£β£β£β£β£β£β¬ ββ£β£β¬ ββ£β£β£ ββ£β£βββ£β£β£β β£β£β£ β£β£β£β£β£β£ ββ£β£β β£β£β£
|
| 126 |
+
β¬β¬β¬β¬β ββ¬β¬β¬β¬βββ£β£β£ββ β«β£β£β£β¬ ββ£β£β¬ ββ£β£β£ ββ£β£β£β ββ£β£β£ β£β£β£ βββ£β£ββ β«β£β£ ββ£β£
|
| 127 |
+
β¬β¬β¬β¬β ββ¬β¬β£β£ β«β£β£β£β¬ ββ£β£β¬ ββ£β£β£ ββ£β£β¬ β£β£β£ β£β£β£ ββ£β£ β£β£β£ββ£β£β
|
| 128 |
+
β¬β¬β¬β β¬β¬β£β£ βββ£β£β¬ ββ£β£β£βββββ£β£β£β ββ£β£β¬ β£β£β£ β£β£β£ ββ£β£β¦β β£β£β£β£β£
|
| 129 |
+
β ββ¦β β¬β¬β£β£ ββββ βββ£β£β£β£ββ ββββ βββ βββ ββ£β£β£ ββ£β£β£
|
| 130 |
+
β©β¬β¬β¬β¬β¬β¬β¦β¦β¬β¬β£β£ββ£β£β£β£β£β£β£β β«β£β£β£β£
|
| 131 |
+
ββ¬β¬β¬β¬β¬β¬β¬β£β£β£β£β£β£ββ
|
| 132 |
+
ββ¬β¬β¬β£β£β£β
|
| 133 |
+
β
|
| 134 |
+
|
| 135 |
+
Version information:
|
| 136 |
+
ml-agents: 1.2.0.dev0,
|
| 137 |
+
ml-agents-envs: 1.2.0.dev0,
|
| 138 |
+
Communicator API: 1.5.0,
|
| 139 |
+
PyTorch: 2.7.1+cu126
|
| 140 |
+
[INFO] Connected to Unity environment with package version 2.2.1-exp.1 and communication version 1.5.0
|
| 141 |
+
[INFO] Connected new brain: Pyramids?team=0
|
| 142 |
+
[INFO] Hyperparameters for behavior name Pyramids:
|
| 143 |
+
trainer_type: ppo
|
| 144 |
+
hyperparameters:
|
| 145 |
+
batch_size: 128
|
| 146 |
+
buffer_size: 2048
|
| 147 |
+
learning_rate: 0.0003
|
| 148 |
+
beta: 0.01
|
| 149 |
+
epsilon: 0.2
|
| 150 |
+
lambd: 0.95
|
| 151 |
+
num_epoch: 3
|
| 152 |
+
shared_critic: False
|
| 153 |
+
learning_rate_schedule: linear
|
| 154 |
+
beta_schedule: linear
|
| 155 |
+
epsilon_schedule: linear
|
| 156 |
+
checkpoint_interval: 500000
|
| 157 |
+
network_settings:
|
| 158 |
+
normalize: False
|
| 159 |
+
hidden_units: 512
|
| 160 |
+
num_layers: 2
|
| 161 |
+
vis_encode_type: simple
|
| 162 |
+
memory: None
|
| 163 |
+
goal_conditioning_type: hyper
|
| 164 |
+
deterministic: False
|
| 165 |
+
reward_signals:
|
| 166 |
+
extrinsic:
|
| 167 |
+
gamma: 0.99
|
| 168 |
+
strength: 1.0
|
| 169 |
+
network_settings:
|
| 170 |
+
normalize: False
|
| 171 |
+
hidden_units: 128
|
| 172 |
+
num_layers: 2
|
| 173 |
+
vis_encode_type: simple
|
| 174 |
+
memory: None
|
| 175 |
+
goal_conditioning_type: hyper
|
| 176 |
+
deterministic: False
|
| 177 |
+
rnd:
|
| 178 |
+
gamma: 0.99
|
| 179 |
+
strength: 0.01
|
| 180 |
+
network_settings:
|
| 181 |
+
normalize: False
|
| 182 |
+
hidden_units: 64
|
| 183 |
+
num_layers: 3
|
| 184 |
+
vis_encode_type: simple
|
| 185 |
+
memory: None
|
| 186 |
+
goal_conditioning_type: hyper
|
| 187 |
+
deterministic: False
|
| 188 |
+
learning_rate: 0.0001
|
| 189 |
+
encoding_size: None
|
| 190 |
+
init_path: None
|
| 191 |
+
keep_checkpoints: 5
|
| 192 |
+
even_checkpoints: False
|
| 193 |
+
max_steps: 1000000
|
| 194 |
+
time_horizon: 128
|
| 195 |
+
summary_freq: 30000
|
| 196 |
+
threaded: False
|
| 197 |
+
self_play: None
|
| 198 |
+
behavioral_cloning: None
|
| 199 |
+
[INFO] Pyramids. Step: 30000. Time Elapsed: 45.356 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
|
| 200 |
+
[INFO] Pyramids. Step: 60000. Time Elapsed: 90.519 s. Mean Reward: -0.853. Std of Reward: 0.588. Training.
|
| 201 |
+
[INFO] Pyramids. Step: 90000. Time Elapsed: 136.319 s. Mean Reward: -0.797. Std of Reward: 0.646. Training.
|
| 202 |
+
[INFO] Pyramids. Step: 120000. Time Elapsed: 182.893 s. Mean Reward: -0.831. Std of Reward: 0.654. Training.
|
| 203 |
+
[INFO] Pyramids. Step: 150000. Time Elapsed: 227.995 s. Mean Reward: -0.715. Std of Reward: 0.760. Training.
|
| 204 |
+
[INFO] Pyramids. Step: 180000. Time Elapsed: 270.527 s. Mean Reward: -0.731. Std of Reward: 0.712. Training.
|
| 205 |
+
[INFO] Pyramids. Step: 210000. Time Elapsed: 316.617 s. Mean Reward: -0.699. Std of Reward: 0.810. Training.
|
| 206 |
+
[INFO] Pyramids. Step: 240000. Time Elapsed: 361.434 s. Mean Reward: -0.640. Std of Reward: 0.822. Training.
|
| 207 |
+
[INFO] Pyramids. Step: 270000. Time Elapsed: 407.787 s. Mean Reward: -0.520. Std of Reward: 0.969. Training.
|
| 208 |
+
[INFO] Pyramids. Step: 300000. Time Elapsed: 451.612 s. Mean Reward: -0.222. Std of Reward: 1.135. Training.
|
| 209 |
+
[INFO] Pyramids. Step: 330000. Time Elapsed: 496.996 s. Mean Reward: -0.328. Std of Reward: 1.124. Training.
|
| 210 |
+
[INFO] Pyramids. Step: 360000. Time Elapsed: 541.248 s. Mean Reward: -0.452. Std of Reward: 0.995. Training.
|
| 211 |
+
[INFO] Pyramids. Step: 390000. Time Elapsed: 587.186 s. Mean Reward: -0.411. Std of Reward: 1.044. Training.
|
| 212 |
+
[INFO] Pyramids. Step: 420000. Time Elapsed: 630.923 s. Mean Reward: -0.042. Std of Reward: 1.228. Training.
|
| 213 |
+
[INFO] Pyramids. Step: 450000. Time Elapsed: 675.866 s. Mean Reward: 0.009. Std of Reward: 1.237. Training.
|
| 214 |
+
[INFO] Pyramids. Step: 480000. Time Elapsed: 721.391 s. Mean Reward: 0.351. Std of Reward: 1.271. Training.
|
| 215 |
+
[INFO] Exported results/PyramidsGPUTest/Pyramids/Pyramids-499992.onnx
|
| 216 |
+
[INFO] Pyramids. Step: 510000. Time Elapsed: 767.344 s. Mean Reward: 0.647. Std of Reward: 1.140. Training.
|
| 217 |
+
[INFO] Pyramids. Step: 540000. Time Elapsed: 812.656 s. Mean Reward: 0.526. Std of Reward: 1.178. Training.
|
| 218 |
+
[INFO] Pyramids. Step: 570000. Time Elapsed: 857.156 s. Mean Reward: 0.525. Std of Reward: 1.236. Training.
|
| 219 |
+
[INFO] Pyramids. Step: 600000. Time Elapsed: 900.647 s. Mean Reward: 0.979. Std of Reward: 0.977. Training.
|
| 220 |
+
[INFO] Pyramids. Step: 630000. Time Elapsed: 949.947 s. Mean Reward: 1.044. Std of Reward: 1.040. Training.
|
| 221 |
+
[INFO] Pyramids. Step: 660000. Time Elapsed: 1006.810 s. Mean Reward: 1.143. Std of Reward: 0.937. Training.
|
| 222 |
+
[INFO] Pyramids. Step: 690000. Time Elapsed: 1062.833 s. Mean Reward: 1.151. Std of Reward: 0.997. Training.
|
| 223 |
+
[INFO] Pyramids. Step: 720000. Time Elapsed: 1119.948 s. Mean Reward: 1.499. Std of Reward: 0.563. Training.
|
| 224 |
+
[INFO] Pyramids. Step: 750000. Time Elapsed: 1178.547 s. Mean Reward: 1.308. Std of Reward: 0.835. Training.
|
| 225 |
+
[INFO] Pyramids. Step: 780000. Time Elapsed: 1226.204 s. Mean Reward: 1.278. Std of Reward: 0.866. Training.
|
| 226 |
+
[INFO] Pyramids. Step: 810000. Time Elapsed: 1275.499 s. Mean Reward: 1.318. Std of Reward: 0.856. Training.
|
| 227 |
+
[INFO] Pyramids. Step: 840000. Time Elapsed: 1322.302 s. Mean Reward: 1.477. Std of Reward: 0.641. Training.
|
| 228 |
+
[INFO] Pyramids. Step: 870000. Time Elapsed: 1370.429 s. Mean Reward: 1.367. Std of Reward: 0.816. Training.
|
| 229 |
+
[INFO] Pyramids. Step: 900000. Time Elapsed: 1418.228 s. Mean Reward: 1.471. Std of Reward: 0.689. Training.
|
| 230 |
+
[INFO] Pyramids. Step: 930000. Time Elapsed: 1465.721 s. Mean Reward: 1.514. Std of Reward: 0.619. Training.
|
| 231 |
+
[INFO] Pyramids. Step: 960000. Time Elapsed: 1513.116 s. Mean Reward: 1.403. Std of Reward: 0.810. Training.
|
| 232 |
+
[INFO] Pyramids. Step: 990000. Time Elapsed: 1563.057 s. Mean Reward: 1.544. Std of Reward: 0.666. Training.
|
| 233 |
+
[INFO] Exported results/PyramidsGPUTest/Pyramids/Pyramids-999909.onnx
|
| 234 |
+
[INFO] Exported results/PyramidsGPUTest/Pyramids/Pyramids-1000037.onnx
|
| 235 |
+
[INFO] Copied results/PyramidsGPUTest/Pyramids/Pyramids-1000037.onnx to results/PyramidsGPUTest/Pyramids.onnx.
|
| 236 |
+
```
|
| 237 |
+
|
| 238 |
β
Model exported to `Pyramids.onnx` after reaching max steps.
|
| 239 |
|
| 240 |
---
|