File size: 20,768 Bytes
0e0ca5c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 | 2025-03-29 14:27:03 | [rl2_trainer] Logging to /home/h2khalil/MetaRL-Assistive-Robotics/data/local/experiment/rl2_trainer
2025-03-29 14:27:14 | [rl2_trainer] Obtaining samples...
2025-03-29 14:31:58 | [rl2_trainer] epoch #0 | Optimizing policy...
2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Fitting baseline...
2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Computing loss before
2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Computing KL before
2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Optimizing
2025-03-29 14:32:11 | [rl2_trainer] epoch #0 | Computing KL after
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Computing loss after
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saving snapshot...
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saved
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Time 298.18 s
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | EpochTime 298.18 s
---------------------------------------- ------------
Average/AverageDiscountedReturn -42.9028
Average/AverageReturn -69.0759
Average/Iteration 0
Average/MaxReturn 5.14373
Average/MinReturn -121.89
Average/NumEpisodes 40
Average/StdReturn 26.7746
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.814994
TotalEnvSteps 4000
__unnamed_task__/AverageDiscountedReturn -42.9028
__unnamed_task__/AverageReturn -69.0759
__unnamed_task__/Iteration 0
__unnamed_task__/MaxReturn 5.14373
__unnamed_task__/MinReturn -121.89
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 26.7746
__unnamed_task__/TerminationRate 0
policy/Entropy 9.91254
policy/KL 0.0179773
policy/KLBefore 0
policy/LossAfter -0.172905
policy/LossBefore 0.0100782
policy/dLoss 0.182983
---------------------------------------- ------------
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing policy...
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Fitting baseline...
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing loss before
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing KL before
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing KL after
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing loss after
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saving snapshot...
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saved
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Time 597.11 s
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | EpochTime 298.93 s
---------------------------------------- ------------
Average/AverageDiscountedReturn -46.6949
Average/AverageReturn -74.2172
Average/Iteration 1
Average/MaxReturn -35.2002
Average/MinReturn -127.671
Average/NumEpisodes 40
Average/StdReturn 23.4651
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.887116
TotalEnvSteps 8000
__unnamed_task__/AverageDiscountedReturn -46.6949
__unnamed_task__/AverageReturn -74.2172
__unnamed_task__/Iteration 1
__unnamed_task__/MaxReturn -35.2002
__unnamed_task__/MinReturn -127.671
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 23.4651
__unnamed_task__/TerminationRate 0
policy/Entropy 9.90552
policy/KL 0.0104231
policy/KLBefore 0
policy/LossAfter -0.108461
policy/LossBefore 0.0091655
policy/dLoss 0.117626
---------------------------------------- ------------
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing policy...
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Fitting baseline...
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing loss before
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing KL before
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing KL after
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing loss after
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saving snapshot...
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saved
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Time 938.81 s
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | EpochTime 341.68 s
---------------------------------------- --------------
Average/AverageDiscountedReturn -45.4614
Average/AverageReturn -72.7992
Average/Iteration 2
Average/MaxReturn -26.0289
Average/MinReturn -137.031
Average/NumEpisodes 40
Average/StdReturn 26.9881
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.840131
TotalEnvSteps 12000
__unnamed_task__/AverageDiscountedReturn -45.4614
__unnamed_task__/AverageReturn -72.7992
__unnamed_task__/Iteration 2
__unnamed_task__/MaxReturn -26.0289
__unnamed_task__/MinReturn -137.031
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 26.9881
__unnamed_task__/TerminationRate 0
policy/Entropy 9.88918
policy/KL 0.00923636
policy/KLBefore 0
policy/LossAfter -0.140978
policy/LossBefore -0.0310702
policy/dLoss 0.109907
---------------------------------------- --------------
2025-03-29 14:44:19 | [rl2_trainer] epoch #3 | Optimizing policy...
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Fitting baseline...
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing loss before
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing KL before
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Optimizing
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing KL after
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing loss after
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saving snapshot...
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saved
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Time 1027.97 s
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | EpochTime 89.16 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -42.7249
Average/AverageReturn -68.2275
Average/Iteration 3
Average/MaxReturn -35.9495
Average/MinReturn -119.74
Average/NumEpisodes 40
Average/StdReturn 22.0106
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.895101
TotalEnvSteps 16000
__unnamed_task__/AverageDiscountedReturn -42.7249
__unnamed_task__/AverageReturn -68.2275
__unnamed_task__/Iteration 3
__unnamed_task__/MaxReturn -35.9495
__unnamed_task__/MinReturn -119.74
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 22.0106
__unnamed_task__/TerminationRate 0
policy/Entropy 9.85707
policy/KL 0.0100265
policy/KLBefore 0
policy/LossAfter -0.130342
policy/LossBefore -0.0353351
policy/dLoss 0.0950072
---------------------------------------- -------------
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing policy...
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Fitting baseline...
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing loss before
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing KL before
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing KL after
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing loss after
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saving snapshot...
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saved
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Time 1122.14 s
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | EpochTime 94.17 s
---------------------------------------- --------------
Average/AverageDiscountedReturn -41.9613
Average/AverageReturn -66.2673
Average/Iteration 4
Average/MaxReturn -33.9462
Average/MinReturn -121.742
Average/NumEpisodes 40
Average/StdReturn 24.5891
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.909156
TotalEnvSteps 20000
__unnamed_task__/AverageDiscountedReturn -41.9613
__unnamed_task__/AverageReturn -66.2673
__unnamed_task__/Iteration 4
__unnamed_task__/MaxReturn -33.9462
__unnamed_task__/MinReturn -121.742
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 24.5891
__unnamed_task__/TerminationRate 0
policy/Entropy 9.81839
policy/KL 0.0102138
policy/KLBefore 0
policy/LossAfter -0.0962488
policy/LossBefore 0.00132629
policy/dLoss 0.0975751
---------------------------------------- --------------
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing policy...
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Fitting baseline...
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing loss before
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing KL before
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing KL after
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing loss after
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saving snapshot...
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saved
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Time 1251.93 s
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | EpochTime 129.78 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -38.2055
Average/AverageReturn -61.7326
Average/Iteration 5
Average/MaxReturn 134.172
Average/MinReturn -125.595
Average/NumEpisodes 40
Average/StdReturn 42.322
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.652002
TotalEnvSteps 24000
__unnamed_task__/AverageDiscountedReturn -38.2055
__unnamed_task__/AverageReturn -61.7326
__unnamed_task__/Iteration 5
__unnamed_task__/MaxReturn 134.172
__unnamed_task__/MinReturn -125.595
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 42.322
__unnamed_task__/TerminationRate 0
policy/Entropy 9.80804
policy/KL 0.0122716
policy/KLBefore 0
policy/LossAfter -0.204539
policy/LossBefore 0.0500677
policy/dLoss 0.254606
---------------------------------------- -------------
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing policy...
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Fitting baseline...
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing loss before
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing KL before
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing
2025-03-29 14:51:35 | [rl2_trainer] epoch #6 | Computing KL after
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Computing loss after
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saving snapshot...
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saved
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Time 1461.75 s
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | EpochTime 209.82 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -42.1921
Average/AverageReturn -67.1612
Average/Iteration 6
Average/MaxReturn -33.1935
Average/MinReturn -110.057
Average/NumEpisodes 40
Average/StdReturn 24.1351
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.848234
TotalEnvSteps 28000
__unnamed_task__/AverageDiscountedReturn -42.1921
__unnamed_task__/AverageReturn -67.1612
__unnamed_task__/Iteration 6
__unnamed_task__/MaxReturn -33.1935
__unnamed_task__/MinReturn -110.057
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 24.1351
__unnamed_task__/TerminationRate 0
policy/Entropy 9.80043
policy/KL 0.014637
policy/KLBefore 0
policy/LossAfter -0.114569
policy/LossBefore -0.0141929
policy/dLoss 0.100376
---------------------------------------- -------------
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing policy...
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Fitting baseline...
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing loss before
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing KL before
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing KL after
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing loss after
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saving snapshot...
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saved
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Time 1701.84 s
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | EpochTime 240.09 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -42.4082
Average/AverageReturn -67.878
Average/Iteration 7
Average/MaxReturn -34.1169
Average/MinReturn -111.115
Average/NumEpisodes 40
Average/StdReturn 19.5859
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.865991
TotalEnvSteps 32000
__unnamed_task__/AverageDiscountedReturn -42.4082
__unnamed_task__/AverageReturn -67.878
__unnamed_task__/Iteration 7
__unnamed_task__/MaxReturn -34.1169
__unnamed_task__/MinReturn -111.115
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 19.5859
__unnamed_task__/TerminationRate 0
policy/Entropy 9.79624
policy/KL 0.0104825
policy/KLBefore 0
policy/LossAfter -0.13989
policy/LossBefore -0.0309541
policy/dLoss 0.108936
---------------------------------------- -------------
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Optimizing policy...
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Fitting baseline...
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Computing loss before
2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Computing KL before
2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Optimizing
2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing KL after
2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing loss after
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saving snapshot...
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saved
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Time 1945.55 s
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | EpochTime 243.70 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -39.7762
Average/AverageReturn -63.9139
Average/Iteration 8
Average/MaxReturn -35.6858
Average/MinReturn -110.7
Average/NumEpisodes 40
Average/StdReturn 20.7657
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.906608
TotalEnvSteps 36000
__unnamed_task__/AverageDiscountedReturn -39.7762
__unnamed_task__/AverageReturn -63.9139
__unnamed_task__/Iteration 8
__unnamed_task__/MaxReturn -35.6858
__unnamed_task__/MinReturn -110.7
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 20.7657
__unnamed_task__/TerminationRate 0
policy/Entropy 9.78585
policy/KL 0.0106836
policy/KLBefore 0
policy/LossAfter -0.0940088
policy/LossBefore -0.0208258
policy/dLoss 0.073183
---------------------------------------- -------------
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Optimizing policy...
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Fitting baseline...
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing loss before
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing KL before
2025-03-29 15:03:43 | [rl2_trainer] epoch #9 | Optimizing
2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing KL after
2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing loss after
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saving snapshot...
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saved
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Time 2197.58 s
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | EpochTime 252.03 s
---------------------------------------- --------------
Average/AverageDiscountedReturn -38.8162
Average/AverageReturn -61.6066
Average/Iteration 9
Average/MaxReturn -11.7124
Average/MinReturn -113.375
Average/NumEpisodes 40
Average/StdReturn 21.625
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.827891
TotalEnvSteps 40000
__unnamed_task__/AverageDiscountedReturn -38.8162
__unnamed_task__/AverageReturn -61.6066
__unnamed_task__/Iteration 9
__unnamed_task__/MaxReturn -11.7124
__unnamed_task__/MinReturn -113.375
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 21.625
__unnamed_task__/TerminationRate 0
policy/Entropy 9.77166
policy/KL 0.00887517
policy/KLBefore 0
policy/LossAfter -0.146794
policy/LossBefore -0.021343
policy/dLoss 0.125451
---------------------------------------- --------------
|