hisham246
Upload models
0e0ca5c
2025-03-29 14:27:03 | [rl2_trainer] Logging to /home/h2khalil/MetaRL-Assistive-Robotics/data/local/experiment/rl2_trainer
2025-03-29 14:27:14 | [rl2_trainer] Obtaining samples...
2025-03-29 14:31:58 | [rl2_trainer] epoch #0 | Optimizing policy...
2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Fitting baseline...
2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Computing loss before
2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Computing KL before
2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Optimizing
2025-03-29 14:32:11 | [rl2_trainer] epoch #0 | Computing KL after
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Computing loss after
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saving snapshot...
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saved
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Time 298.18 s
2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | EpochTime 298.18 s
---------------------------------------- ------------
Average/AverageDiscountedReturn -42.9028
Average/AverageReturn -69.0759
Average/Iteration 0
Average/MaxReturn 5.14373
Average/MinReturn -121.89
Average/NumEpisodes 40
Average/StdReturn 26.7746
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.814994
TotalEnvSteps 4000
__unnamed_task__/AverageDiscountedReturn -42.9028
__unnamed_task__/AverageReturn -69.0759
__unnamed_task__/Iteration 0
__unnamed_task__/MaxReturn 5.14373
__unnamed_task__/MinReturn -121.89
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 26.7746
__unnamed_task__/TerminationRate 0
policy/Entropy 9.91254
policy/KL 0.0179773
policy/KLBefore 0
policy/LossAfter -0.172905
policy/LossBefore 0.0100782
policy/dLoss 0.182983
---------------------------------------- ------------
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing policy...
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Fitting baseline...
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing loss before
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing KL before
2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing KL after
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing loss after
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saving snapshot...
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saved
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Time 597.11 s
2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | EpochTime 298.93 s
---------------------------------------- ------------
Average/AverageDiscountedReturn -46.6949
Average/AverageReturn -74.2172
Average/Iteration 1
Average/MaxReturn -35.2002
Average/MinReturn -127.671
Average/NumEpisodes 40
Average/StdReturn 23.4651
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.887116
TotalEnvSteps 8000
__unnamed_task__/AverageDiscountedReturn -46.6949
__unnamed_task__/AverageReturn -74.2172
__unnamed_task__/Iteration 1
__unnamed_task__/MaxReturn -35.2002
__unnamed_task__/MinReturn -127.671
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 23.4651
__unnamed_task__/TerminationRate 0
policy/Entropy 9.90552
policy/KL 0.0104231
policy/KLBefore 0
policy/LossAfter -0.108461
policy/LossBefore 0.0091655
policy/dLoss 0.117626
---------------------------------------- ------------
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing policy...
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Fitting baseline...
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing loss before
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing KL before
2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing KL after
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing loss after
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saving snapshot...
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saved
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Time 938.81 s
2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | EpochTime 341.68 s
---------------------------------------- --------------
Average/AverageDiscountedReturn -45.4614
Average/AverageReturn -72.7992
Average/Iteration 2
Average/MaxReturn -26.0289
Average/MinReturn -137.031
Average/NumEpisodes 40
Average/StdReturn 26.9881
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.840131
TotalEnvSteps 12000
__unnamed_task__/AverageDiscountedReturn -45.4614
__unnamed_task__/AverageReturn -72.7992
__unnamed_task__/Iteration 2
__unnamed_task__/MaxReturn -26.0289
__unnamed_task__/MinReturn -137.031
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 26.9881
__unnamed_task__/TerminationRate 0
policy/Entropy 9.88918
policy/KL 0.00923636
policy/KLBefore 0
policy/LossAfter -0.140978
policy/LossBefore -0.0310702
policy/dLoss 0.109907
---------------------------------------- --------------
2025-03-29 14:44:19 | [rl2_trainer] epoch #3 | Optimizing policy...
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Fitting baseline...
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing loss before
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing KL before
2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Optimizing
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing KL after
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing loss after
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saving snapshot...
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saved
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Time 1027.97 s
2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | EpochTime 89.16 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -42.7249
Average/AverageReturn -68.2275
Average/Iteration 3
Average/MaxReturn -35.9495
Average/MinReturn -119.74
Average/NumEpisodes 40
Average/StdReturn 22.0106
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.895101
TotalEnvSteps 16000
__unnamed_task__/AverageDiscountedReturn -42.7249
__unnamed_task__/AverageReturn -68.2275
__unnamed_task__/Iteration 3
__unnamed_task__/MaxReturn -35.9495
__unnamed_task__/MinReturn -119.74
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 22.0106
__unnamed_task__/TerminationRate 0
policy/Entropy 9.85707
policy/KL 0.0100265
policy/KLBefore 0
policy/LossAfter -0.130342
policy/LossBefore -0.0353351
policy/dLoss 0.0950072
---------------------------------------- -------------
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing policy...
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Fitting baseline...
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing loss before
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing KL before
2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing KL after
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing loss after
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saving snapshot...
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saved
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Time 1122.14 s
2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | EpochTime 94.17 s
---------------------------------------- --------------
Average/AverageDiscountedReturn -41.9613
Average/AverageReturn -66.2673
Average/Iteration 4
Average/MaxReturn -33.9462
Average/MinReturn -121.742
Average/NumEpisodes 40
Average/StdReturn 24.5891
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.909156
TotalEnvSteps 20000
__unnamed_task__/AverageDiscountedReturn -41.9613
__unnamed_task__/AverageReturn -66.2673
__unnamed_task__/Iteration 4
__unnamed_task__/MaxReturn -33.9462
__unnamed_task__/MinReturn -121.742
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 24.5891
__unnamed_task__/TerminationRate 0
policy/Entropy 9.81839
policy/KL 0.0102138
policy/KLBefore 0
policy/LossAfter -0.0962488
policy/LossBefore 0.00132629
policy/dLoss 0.0975751
---------------------------------------- --------------
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing policy...
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Fitting baseline...
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing loss before
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing KL before
2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing KL after
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing loss after
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saving snapshot...
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saved
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Time 1251.93 s
2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | EpochTime 129.78 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -38.2055
Average/AverageReturn -61.7326
Average/Iteration 5
Average/MaxReturn 134.172
Average/MinReturn -125.595
Average/NumEpisodes 40
Average/StdReturn 42.322
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.652002
TotalEnvSteps 24000
__unnamed_task__/AverageDiscountedReturn -38.2055
__unnamed_task__/AverageReturn -61.7326
__unnamed_task__/Iteration 5
__unnamed_task__/MaxReturn 134.172
__unnamed_task__/MinReturn -125.595
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 42.322
__unnamed_task__/TerminationRate 0
policy/Entropy 9.80804
policy/KL 0.0122716
policy/KLBefore 0
policy/LossAfter -0.204539
policy/LossBefore 0.0500677
policy/dLoss 0.254606
---------------------------------------- -------------
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing policy...
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Fitting baseline...
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing loss before
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing KL before
2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing
2025-03-29 14:51:35 | [rl2_trainer] epoch #6 | Computing KL after
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Computing loss after
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saving snapshot...
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saved
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Time 1461.75 s
2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | EpochTime 209.82 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -42.1921
Average/AverageReturn -67.1612
Average/Iteration 6
Average/MaxReturn -33.1935
Average/MinReturn -110.057
Average/NumEpisodes 40
Average/StdReturn 24.1351
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.848234
TotalEnvSteps 28000
__unnamed_task__/AverageDiscountedReturn -42.1921
__unnamed_task__/AverageReturn -67.1612
__unnamed_task__/Iteration 6
__unnamed_task__/MaxReturn -33.1935
__unnamed_task__/MinReturn -110.057
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 24.1351
__unnamed_task__/TerminationRate 0
policy/Entropy 9.80043
policy/KL 0.014637
policy/KLBefore 0
policy/LossAfter -0.114569
policy/LossBefore -0.0141929
policy/dLoss 0.100376
---------------------------------------- -------------
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing policy...
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Fitting baseline...
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing loss before
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing KL before
2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing KL after
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing loss after
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saving snapshot...
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saved
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Time 1701.84 s
2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | EpochTime 240.09 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -42.4082
Average/AverageReturn -67.878
Average/Iteration 7
Average/MaxReturn -34.1169
Average/MinReturn -111.115
Average/NumEpisodes 40
Average/StdReturn 19.5859
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.865991
TotalEnvSteps 32000
__unnamed_task__/AverageDiscountedReturn -42.4082
__unnamed_task__/AverageReturn -67.878
__unnamed_task__/Iteration 7
__unnamed_task__/MaxReturn -34.1169
__unnamed_task__/MinReturn -111.115
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 19.5859
__unnamed_task__/TerminationRate 0
policy/Entropy 9.79624
policy/KL 0.0104825
policy/KLBefore 0
policy/LossAfter -0.13989
policy/LossBefore -0.0309541
policy/dLoss 0.108936
---------------------------------------- -------------
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Optimizing policy...
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Fitting baseline...
2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Computing loss before
2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Computing KL before
2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Optimizing
2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing KL after
2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing loss after
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saving snapshot...
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saved
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Time 1945.55 s
2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | EpochTime 243.70 s
---------------------------------------- -------------
Average/AverageDiscountedReturn -39.7762
Average/AverageReturn -63.9139
Average/Iteration 8
Average/MaxReturn -35.6858
Average/MinReturn -110.7
Average/NumEpisodes 40
Average/StdReturn 20.7657
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.906608
TotalEnvSteps 36000
__unnamed_task__/AverageDiscountedReturn -39.7762
__unnamed_task__/AverageReturn -63.9139
__unnamed_task__/Iteration 8
__unnamed_task__/MaxReturn -35.6858
__unnamed_task__/MinReturn -110.7
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 20.7657
__unnamed_task__/TerminationRate 0
policy/Entropy 9.78585
policy/KL 0.0106836
policy/KLBefore 0
policy/LossAfter -0.0940088
policy/LossBefore -0.0208258
policy/dLoss 0.073183
---------------------------------------- -------------
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Optimizing policy...
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Fitting baseline...
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing loss before
2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing KL before
2025-03-29 15:03:43 | [rl2_trainer] epoch #9 | Optimizing
2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing KL after
2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing loss after
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saving snapshot...
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saved
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Time 2197.58 s
2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | EpochTime 252.03 s
---------------------------------------- --------------
Average/AverageDiscountedReturn -38.8162
Average/AverageReturn -61.6066
Average/Iteration 9
Average/MaxReturn -11.7124
Average/MinReturn -113.375
Average/NumEpisodes 40
Average/StdReturn 21.625
Average/TerminationRate 0
LinearFeatureBaseline/ExplainedVariance 0.827891
TotalEnvSteps 40000
__unnamed_task__/AverageDiscountedReturn -38.8162
__unnamed_task__/AverageReturn -61.6066
__unnamed_task__/Iteration 9
__unnamed_task__/MaxReturn -11.7124
__unnamed_task__/MinReturn -113.375
__unnamed_task__/NumEpisodes 40
__unnamed_task__/StdReturn 21.625
__unnamed_task__/TerminationRate 0
policy/Entropy 9.77166
policy/KL 0.00887517
policy/KLBefore 0
policy/LossAfter -0.146794
policy/LossBefore -0.021343
policy/dLoss 0.125451
---------------------------------------- --------------