File size: 20,768 Bytes
0e0ca5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
2025-03-29 14:27:03 | [rl2_trainer] Logging to /home/h2khalil/MetaRL-Assistive-Robotics/data/local/experiment/rl2_trainer
2025-03-29 14:27:14 | [rl2_trainer] Obtaining samples...
2025-03-29 14:31:58 | [rl2_trainer] epoch #0 | Optimizing policy...
2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Fitting baseline...
2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Computing loss before
2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Computing KL before
2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Optimizing
2025-03-29 14:32:11 | [rl2_trainer] epoch #0 | Computing KL after
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Computing loss after
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saving snapshot...
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saved
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Time 298.18 s
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | EpochTime 298.18 s
----------------------------------------  ------------
Average/AverageDiscountedReturn            -42.9028
Average/AverageReturn                      -69.0759
Average/Iteration                            0
Average/MaxReturn                            5.14373
Average/MinReturn                         -121.89
Average/NumEpisodes                         40
Average/StdReturn                           26.7746
Average/TerminationRate                      0
LinearFeatureBaseline/ExplainedVariance      0.814994
TotalEnvSteps                             4000
__unnamed_task__/AverageDiscountedReturn   -42.9028
__unnamed_task__/AverageReturn             -69.0759
__unnamed_task__/Iteration                   0
__unnamed_task__/MaxReturn                   5.14373
__unnamed_task__/MinReturn                -121.89
__unnamed_task__/NumEpisodes                40
__unnamed_task__/StdReturn                  26.7746
__unnamed_task__/TerminationRate             0
policy/Entropy                               9.91254
policy/KL                                    0.0179773
policy/KLBefore                              0
policy/LossAfter                            -0.172905
policy/LossBefore                            0.0100782
policy/dLoss                                 0.182983
----------------------------------------  ------------
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing policy...
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Fitting baseline...
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing loss before
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing KL before
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing KL after
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing loss after
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saving snapshot...
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saved
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Time 597.11 s
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | EpochTime 298.93 s
----------------------------------------  ------------
Average/AverageDiscountedReturn            -46.6949
Average/AverageReturn                      -74.2172
Average/Iteration                            1
Average/MaxReturn                          -35.2002
Average/MinReturn                         -127.671
Average/NumEpisodes                         40
Average/StdReturn                           23.4651
Average/TerminationRate                      0
LinearFeatureBaseline/ExplainedVariance      0.887116
TotalEnvSteps                             8000
__unnamed_task__/AverageDiscountedReturn   -46.6949
__unnamed_task__/AverageReturn             -74.2172
__unnamed_task__/Iteration                   1
__unnamed_task__/MaxReturn                 -35.2002
__unnamed_task__/MinReturn                -127.671
__unnamed_task__/NumEpisodes                40
__unnamed_task__/StdReturn                  23.4651
__unnamed_task__/TerminationRate             0
policy/Entropy                               9.90552
policy/KL                                    0.0104231
policy/KLBefore                              0
policy/LossAfter                            -0.108461
policy/LossBefore                            0.0091655
policy/dLoss                                 0.117626
----------------------------------------  ------------
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing policy...
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Fitting baseline...
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing loss before
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing KL before
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing KL after
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing loss after
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saving snapshot...
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saved
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Time 938.81 s
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | EpochTime 341.68 s
----------------------------------------  --------------
Average/AverageDiscountedReturn             -45.4614
Average/AverageReturn                       -72.7992
Average/Iteration                             2
Average/MaxReturn                           -26.0289
Average/MinReturn                          -137.031
Average/NumEpisodes                          40
Average/StdReturn                            26.9881
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.840131
TotalEnvSteps                             12000
__unnamed_task__/AverageDiscountedReturn    -45.4614
__unnamed_task__/AverageReturn              -72.7992
__unnamed_task__/Iteration                    2
__unnamed_task__/MaxReturn                  -26.0289
__unnamed_task__/MinReturn                 -137.031
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   26.9881
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.88918
policy/KL                                     0.00923636
policy/KLBefore                               0
policy/LossAfter                             -0.140978
policy/LossBefore                            -0.0310702
policy/dLoss                                  0.109907
----------------------------------------  --------------
2025-03-29 14:44:19 | [rl2_trainer] epoch #3 | Optimizing policy...
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Fitting baseline...
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing loss before
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing KL before
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Optimizing
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing KL after
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing loss after
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saving snapshot...
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saved
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Time 1027.97 s
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | EpochTime 89.16 s
----------------------------------------  -------------
Average/AverageDiscountedReturn             -42.7249
Average/AverageReturn                       -68.2275
Average/Iteration                             3
Average/MaxReturn                           -35.9495
Average/MinReturn                          -119.74
Average/NumEpisodes                          40
Average/StdReturn                            22.0106
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.895101
TotalEnvSteps                             16000
__unnamed_task__/AverageDiscountedReturn    -42.7249
__unnamed_task__/AverageReturn              -68.2275
__unnamed_task__/Iteration                    3
__unnamed_task__/MaxReturn                  -35.9495
__unnamed_task__/MinReturn                 -119.74
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   22.0106
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.85707
policy/KL                                     0.0100265
policy/KLBefore                               0
policy/LossAfter                             -0.130342
policy/LossBefore                            -0.0353351
policy/dLoss                                  0.0950072
----------------------------------------  -------------
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing policy...
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Fitting baseline...
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing loss before
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing KL before
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing KL after
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing loss after
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saving snapshot...
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saved
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Time 1122.14 s
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | EpochTime 94.17 s
----------------------------------------  --------------
Average/AverageDiscountedReturn             -41.9613
Average/AverageReturn                       -66.2673
Average/Iteration                             4
Average/MaxReturn                           -33.9462
Average/MinReturn                          -121.742
Average/NumEpisodes                          40
Average/StdReturn                            24.5891
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.909156
TotalEnvSteps                             20000
__unnamed_task__/AverageDiscountedReturn    -41.9613
__unnamed_task__/AverageReturn              -66.2673
__unnamed_task__/Iteration                    4
__unnamed_task__/MaxReturn                  -33.9462
__unnamed_task__/MinReturn                 -121.742
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   24.5891
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.81839
policy/KL                                     0.0102138
policy/KLBefore                               0
policy/LossAfter                             -0.0962488
policy/LossBefore                             0.00132629
policy/dLoss                                  0.0975751
----------------------------------------  --------------
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing policy...
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Fitting baseline...
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing loss before
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing KL before
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing KL after
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing loss after
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saving snapshot...
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saved
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Time 1251.93 s
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | EpochTime 129.78 s
----------------------------------------  -------------
Average/AverageDiscountedReturn             -38.2055
Average/AverageReturn                       -61.7326
Average/Iteration                             5
Average/MaxReturn                           134.172
Average/MinReturn                          -125.595
Average/NumEpisodes                          40
Average/StdReturn                            42.322
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.652002
TotalEnvSteps                             24000
__unnamed_task__/AverageDiscountedReturn    -38.2055
__unnamed_task__/AverageReturn              -61.7326
__unnamed_task__/Iteration                    5
__unnamed_task__/MaxReturn                  134.172
__unnamed_task__/MinReturn                 -125.595
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   42.322
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.80804
policy/KL                                     0.0122716
policy/KLBefore                               0
policy/LossAfter                             -0.204539
policy/LossBefore                             0.0500677
policy/dLoss                                  0.254606
----------------------------------------  -------------
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing policy...
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Fitting baseline...
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing loss before
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing KL before
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing
2025-03-29 14:51:35 | [rl2_trainer] epoch #6 | Computing KL after
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Computing loss after
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saving snapshot...
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saved
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Time 1461.75 s
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | EpochTime 209.82 s
----------------------------------------  -------------
Average/AverageDiscountedReturn             -42.1921
Average/AverageReturn                       -67.1612
Average/Iteration                             6
Average/MaxReturn                           -33.1935
Average/MinReturn                          -110.057
Average/NumEpisodes                          40
Average/StdReturn                            24.1351
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.848234
TotalEnvSteps                             28000
__unnamed_task__/AverageDiscountedReturn    -42.1921
__unnamed_task__/AverageReturn              -67.1612
__unnamed_task__/Iteration                    6
__unnamed_task__/MaxReturn                  -33.1935
__unnamed_task__/MinReturn                 -110.057
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   24.1351
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.80043
policy/KL                                     0.014637
policy/KLBefore                               0
policy/LossAfter                             -0.114569
policy/LossBefore                            -0.0141929
policy/dLoss                                  0.100376
----------------------------------------  -------------
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing policy...
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Fitting baseline...
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing loss before
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing KL before
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing KL after
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing loss after
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saving snapshot...
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saved
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Time 1701.84 s
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | EpochTime 240.09 s
----------------------------------------  -------------
Average/AverageDiscountedReturn             -42.4082
Average/AverageReturn                       -67.878
Average/Iteration                             7
Average/MaxReturn                           -34.1169
Average/MinReturn                          -111.115
Average/NumEpisodes                          40
Average/StdReturn                            19.5859
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.865991
TotalEnvSteps                             32000
__unnamed_task__/AverageDiscountedReturn    -42.4082
__unnamed_task__/AverageReturn              -67.878
__unnamed_task__/Iteration                    7
__unnamed_task__/MaxReturn                  -34.1169
__unnamed_task__/MinReturn                 -111.115
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   19.5859
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.79624
policy/KL                                     0.0104825
policy/KLBefore                               0
policy/LossAfter                             -0.13989
policy/LossBefore                            -0.0309541
policy/dLoss                                  0.108936
----------------------------------------  -------------
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Optimizing policy...
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Fitting baseline...
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Computing loss before
2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Computing KL before
2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Optimizing
2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing KL after
2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing loss after
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saving snapshot...
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saved
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Time 1945.55 s
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | EpochTime 243.70 s
----------------------------------------  -------------
Average/AverageDiscountedReturn             -39.7762
Average/AverageReturn                       -63.9139
Average/Iteration                             8
Average/MaxReturn                           -35.6858
Average/MinReturn                          -110.7
Average/NumEpisodes                          40
Average/StdReturn                            20.7657
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.906608
TotalEnvSteps                             36000
__unnamed_task__/AverageDiscountedReturn    -39.7762
__unnamed_task__/AverageReturn              -63.9139
__unnamed_task__/Iteration                    8
__unnamed_task__/MaxReturn                  -35.6858
__unnamed_task__/MinReturn                 -110.7
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   20.7657
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.78585
policy/KL                                     0.0106836
policy/KLBefore                               0
policy/LossAfter                             -0.0940088
policy/LossBefore                            -0.0208258
policy/dLoss                                  0.073183
----------------------------------------  -------------
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Optimizing policy...
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Fitting baseline...
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing loss before
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing KL before
2025-03-29 15:03:43 | [rl2_trainer] epoch #9 | Optimizing
2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing KL after
2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing loss after
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saving snapshot...
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saved
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Time 2197.58 s
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | EpochTime 252.03 s
----------------------------------------  --------------
Average/AverageDiscountedReturn             -38.8162
Average/AverageReturn                       -61.6066
Average/Iteration                             9
Average/MaxReturn                           -11.7124
Average/MinReturn                          -113.375
Average/NumEpisodes                          40
Average/StdReturn                            21.625
Average/TerminationRate                       0
LinearFeatureBaseline/ExplainedVariance       0.827891
TotalEnvSteps                             40000
__unnamed_task__/AverageDiscountedReturn    -38.8162
__unnamed_task__/AverageReturn              -61.6066
__unnamed_task__/Iteration                    9
__unnamed_task__/MaxReturn                  -11.7124
__unnamed_task__/MinReturn                 -113.375
__unnamed_task__/NumEpisodes                 40
__unnamed_task__/StdReturn                   21.625
__unnamed_task__/TerminationRate              0
policy/Entropy                                9.77166
policy/KL                                     0.00887517
policy/KLBefore                               0
policy/LossAfter                             -0.146794
policy/LossBefore                            -0.021343
policy/dLoss                                  0.125451
----------------------------------------  --------------