jetfan-xin commited on
Commit
8af299c
Β·
verified Β·
1 Parent(s): a1b5377

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md CHANGED
@@ -110,6 +110,131 @@ Sample rewards from training log:
110
  | 840,000 | 1.47 |
111
  | 990,000 | 1.54 |
112
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  βœ… Model exported to `Pyramids.onnx` after reaching max steps.
114
 
115
  ---
 
110
  | 840,000 | 1.47 |
111
  | 990,000 | 1.54 |
112
 
113
+ details:
114
+ ```
115
+ (rl_py310) 4xin@ltgpu3:~/deep_rl/unit5/ml-agents$ CUDA_VISIBLE_DEVICES=3 mlagents-learn ./config/ppo/PyramidsRND.yaml \
116
+ --env=./training-envs-executables/linux/Pyramids/Pyramids.x86_64 \
117
+ --run-id="PyramidsGPUTest" \
118
+ --no-graphics
119
+
120
+ ┐ β•–
121
+ ╓╖╬│║ ││╬╖╖
122
+ β•“β•–β•¬β”‚β”‚β”‚β”‚β”‚β”˜ ╬│││││╬╖
123
+ β•–β•¬β”‚β”‚β”‚β”‚β”‚β•¬β•œ ╙╬│││││╖╖ β•—β•—β•—
124
+ ╬╬╬╬╖││╦╖ ╖╬││╗╣╣╣╬ β•Ÿβ•£β•£β•¬ β•Ÿβ•£β•£β•£ β•œβ•œβ•œ β•Ÿβ•£β•£
125
+ ╬╬╬╬╬╬╬╬╖│╬╖╖╓╬β•ͺ│╓╣╣╣╣╣╣╣╬ β•Ÿβ•£β•£β•¬ β•Ÿβ•£β•£β•£ β•’β•£β•£β•–β•—β•£β•£β•£β•— β•£β•£β•£ β•£β•£β•£β•£β•£β•£ β•Ÿβ•£β•£β•– β•£β•£β•£
126
+ ╬╬╬╬┐ β•™β•¬β•¬β•¬β•¬β”‚β•“β•£β•£β•£β•β•œ ╫╣╣╣╬ β•Ÿβ•£β•£β•¬ β•Ÿβ•£β•£β•£ β•Ÿβ•£β•£β•£β•™ β•™β•£β•£β•£ β•£β•£β•£ β•™β•Ÿβ•£β•£β•œβ•™ β•«β•£β•£ β•Ÿβ•£β•£
127
+ ╬╬╬╬┐ ╙╬╬╣╣ ╫╣╣╣╬ β•Ÿβ•£β•£β•¬ β•Ÿβ•£β•£β•£ β•Ÿβ•£β•£β•¬ β•£β•£β•£ β•£β•£β•£ β•Ÿβ•£β•£ β•£β•£β•£β”Œβ•£β•£β•œ
128
+ β•¬β•¬β•¬β•œ ╬╬╣╣ ╙╝╣╣╬ β•™β•£β•£β•£β•—β•–β•“β•—β•£β•£β•£β•œ β•Ÿβ•£β•£β•¬ β•£β•£β•£ β•£β•£β•£ β•Ÿβ•£β•£β•¦β•“ β•£β•£β•£β•£β•£
129
+ β•™ ╓╦╖ ╬╬╣╣ β•“β•—β•—β•– β•™β•β•£β•£β•£β•£β•β•œ β•˜β•β•β•œ ╝╝╝ ╝╝╝ β•™β•£β•£β•£ β•Ÿβ•£β•£β•£
130
+ ╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝ β•«β•£β•£β•£β•£
131
+ β•™β•¬β•¬β•¬β•¬β•¬β•¬β•¬β•£β•£β•£β•£β•£β•£β•β•œ
132
+ β•™β•¬β•¬β•¬β•£β•£β•£β•œ
133
+ β•™
134
+
135
+ Version information:
136
+ ml-agents: 1.2.0.dev0,
137
+ ml-agents-envs: 1.2.0.dev0,
138
+ Communicator API: 1.5.0,
139
+ PyTorch: 2.7.1+cu126
140
+ [INFO] Connected to Unity environment with package version 2.2.1-exp.1 and communication version 1.5.0
141
+ [INFO] Connected new brain: Pyramids?team=0
142
+ [INFO] Hyperparameters for behavior name Pyramids:
143
+ trainer_type: ppo
144
+ hyperparameters:
145
+ batch_size: 128
146
+ buffer_size: 2048
147
+ learning_rate: 0.0003
148
+ beta: 0.01
149
+ epsilon: 0.2
150
+ lambd: 0.95
151
+ num_epoch: 3
152
+ shared_critic: False
153
+ learning_rate_schedule: linear
154
+ beta_schedule: linear
155
+ epsilon_schedule: linear
156
+ checkpoint_interval: 500000
157
+ network_settings:
158
+ normalize: False
159
+ hidden_units: 512
160
+ num_layers: 2
161
+ vis_encode_type: simple
162
+ memory: None
163
+ goal_conditioning_type: hyper
164
+ deterministic: False
165
+ reward_signals:
166
+ extrinsic:
167
+ gamma: 0.99
168
+ strength: 1.0
169
+ network_settings:
170
+ normalize: False
171
+ hidden_units: 128
172
+ num_layers: 2
173
+ vis_encode_type: simple
174
+ memory: None
175
+ goal_conditioning_type: hyper
176
+ deterministic: False
177
+ rnd:
178
+ gamma: 0.99
179
+ strength: 0.01
180
+ network_settings:
181
+ normalize: False
182
+ hidden_units: 64
183
+ num_layers: 3
184
+ vis_encode_type: simple
185
+ memory: None
186
+ goal_conditioning_type: hyper
187
+ deterministic: False
188
+ learning_rate: 0.0001
189
+ encoding_size: None
190
+ init_path: None
191
+ keep_checkpoints: 5
192
+ even_checkpoints: False
193
+ max_steps: 1000000
194
+ time_horizon: 128
195
+ summary_freq: 30000
196
+ threaded: False
197
+ self_play: None
198
+ behavioral_cloning: None
199
+ [INFO] Pyramids. Step: 30000. Time Elapsed: 45.356 s. Mean Reward: -1.000. Std of Reward: 0.000. Training.
200
+ [INFO] Pyramids. Step: 60000. Time Elapsed: 90.519 s. Mean Reward: -0.853. Std of Reward: 0.588. Training.
201
+ [INFO] Pyramids. Step: 90000. Time Elapsed: 136.319 s. Mean Reward: -0.797. Std of Reward: 0.646. Training.
202
+ [INFO] Pyramids. Step: 120000. Time Elapsed: 182.893 s. Mean Reward: -0.831. Std of Reward: 0.654. Training.
203
+ [INFO] Pyramids. Step: 150000. Time Elapsed: 227.995 s. Mean Reward: -0.715. Std of Reward: 0.760. Training.
204
+ [INFO] Pyramids. Step: 180000. Time Elapsed: 270.527 s. Mean Reward: -0.731. Std of Reward: 0.712. Training.
205
+ [INFO] Pyramids. Step: 210000. Time Elapsed: 316.617 s. Mean Reward: -0.699. Std of Reward: 0.810. Training.
206
+ [INFO] Pyramids. Step: 240000. Time Elapsed: 361.434 s. Mean Reward: -0.640. Std of Reward: 0.822. Training.
207
+ [INFO] Pyramids. Step: 270000. Time Elapsed: 407.787 s. Mean Reward: -0.520. Std of Reward: 0.969. Training.
208
+ [INFO] Pyramids. Step: 300000. Time Elapsed: 451.612 s. Mean Reward: -0.222. Std of Reward: 1.135. Training.
209
+ [INFO] Pyramids. Step: 330000. Time Elapsed: 496.996 s. Mean Reward: -0.328. Std of Reward: 1.124. Training.
210
+ [INFO] Pyramids. Step: 360000. Time Elapsed: 541.248 s. Mean Reward: -0.452. Std of Reward: 0.995. Training.
211
+ [INFO] Pyramids. Step: 390000. Time Elapsed: 587.186 s. Mean Reward: -0.411. Std of Reward: 1.044. Training.
212
+ [INFO] Pyramids. Step: 420000. Time Elapsed: 630.923 s. Mean Reward: -0.042. Std of Reward: 1.228. Training.
213
+ [INFO] Pyramids. Step: 450000. Time Elapsed: 675.866 s. Mean Reward: 0.009. Std of Reward: 1.237. Training.
214
+ [INFO] Pyramids. Step: 480000. Time Elapsed: 721.391 s. Mean Reward: 0.351. Std of Reward: 1.271. Training.
215
+ [INFO] Exported results/PyramidsGPUTest/Pyramids/Pyramids-499992.onnx
216
+ [INFO] Pyramids. Step: 510000. Time Elapsed: 767.344 s. Mean Reward: 0.647. Std of Reward: 1.140. Training.
217
+ [INFO] Pyramids. Step: 540000. Time Elapsed: 812.656 s. Mean Reward: 0.526. Std of Reward: 1.178. Training.
218
+ [INFO] Pyramids. Step: 570000. Time Elapsed: 857.156 s. Mean Reward: 0.525. Std of Reward: 1.236. Training.
219
+ [INFO] Pyramids. Step: 600000. Time Elapsed: 900.647 s. Mean Reward: 0.979. Std of Reward: 0.977. Training.
220
+ [INFO] Pyramids. Step: 630000. Time Elapsed: 949.947 s. Mean Reward: 1.044. Std of Reward: 1.040. Training.
221
+ [INFO] Pyramids. Step: 660000. Time Elapsed: 1006.810 s. Mean Reward: 1.143. Std of Reward: 0.937. Training.
222
+ [INFO] Pyramids. Step: 690000. Time Elapsed: 1062.833 s. Mean Reward: 1.151. Std of Reward: 0.997. Training.
223
+ [INFO] Pyramids. Step: 720000. Time Elapsed: 1119.948 s. Mean Reward: 1.499. Std of Reward: 0.563. Training.
224
+ [INFO] Pyramids. Step: 750000. Time Elapsed: 1178.547 s. Mean Reward: 1.308. Std of Reward: 0.835. Training.
225
+ [INFO] Pyramids. Step: 780000. Time Elapsed: 1226.204 s. Mean Reward: 1.278. Std of Reward: 0.866. Training.
226
+ [INFO] Pyramids. Step: 810000. Time Elapsed: 1275.499 s. Mean Reward: 1.318. Std of Reward: 0.856. Training.
227
+ [INFO] Pyramids. Step: 840000. Time Elapsed: 1322.302 s. Mean Reward: 1.477. Std of Reward: 0.641. Training.
228
+ [INFO] Pyramids. Step: 870000. Time Elapsed: 1370.429 s. Mean Reward: 1.367. Std of Reward: 0.816. Training.
229
+ [INFO] Pyramids. Step: 900000. Time Elapsed: 1418.228 s. Mean Reward: 1.471. Std of Reward: 0.689. Training.
230
+ [INFO] Pyramids. Step: 930000. Time Elapsed: 1465.721 s. Mean Reward: 1.514. Std of Reward: 0.619. Training.
231
+ [INFO] Pyramids. Step: 960000. Time Elapsed: 1513.116 s. Mean Reward: 1.403. Std of Reward: 0.810. Training.
232
+ [INFO] Pyramids. Step: 990000. Time Elapsed: 1563.057 s. Mean Reward: 1.544. Std of Reward: 0.666. Training.
233
+ [INFO] Exported results/PyramidsGPUTest/Pyramids/Pyramids-999909.onnx
234
+ [INFO] Exported results/PyramidsGPUTest/Pyramids/Pyramids-1000037.onnx
235
+ [INFO] Copied results/PyramidsGPUTest/Pyramids/Pyramids-1000037.onnx to results/PyramidsGPUTest/Pyramids.onnx.
236
+ ```
237
+
238
  βœ… Model exported to `Pyramids.onnx` after reaching max steps.
239
 
240
  ---