| 2025-03-29 14:27:03 | [rl2_trainer] Logging to /home/h2khalil/MetaRL-Assistive-Robotics/data/local/experiment/rl2_trainer | |
| 2025-03-29 14:27:14 | [rl2_trainer] Obtaining samples... | |
| 2025-03-29 14:31:58 | [rl2_trainer] epoch #0 | Optimizing policy... | |
| 2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Fitting baseline... | |
| 2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Computing loss before | |
| 2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Computing KL before | |
| 2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Optimizing | |
| 2025-03-29 14:32:11 | [rl2_trainer] epoch #0 | Computing KL after | |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Computing loss after | |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saving snapshot... | |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saved | |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Time 298.18 s | |
| 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | EpochTime 298.18 s | |
| ---------------------------------------- ------------ | |
| Average/AverageDiscountedReturn -42.9028 | |
| Average/AverageReturn -69.0759 | |
| Average/Iteration 0 | |
| Average/MaxReturn 5.14373 | |
| Average/MinReturn -121.89 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 26.7746 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.814994 | |
| TotalEnvSteps 4000 | |
| __unnamed_task__/AverageDiscountedReturn -42.9028 | |
| __unnamed_task__/AverageReturn -69.0759 | |
| __unnamed_task__/Iteration 0 | |
| __unnamed_task__/MaxReturn 5.14373 | |
| __unnamed_task__/MinReturn -121.89 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 26.7746 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.91254 | |
| policy/KL 0.0179773 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.172905 | |
| policy/LossBefore 0.0100782 | |
| policy/dLoss 0.182983 | |
| ---------------------------------------- ------------ | |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing policy... | |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Fitting baseline... | |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing loss before | |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing KL before | |
| 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing | |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing KL after | |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing loss after | |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saving snapshot... | |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saved | |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Time 597.11 s | |
| 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | EpochTime 298.93 s | |
| ---------------------------------------- ------------ | |
| Average/AverageDiscountedReturn -46.6949 | |
| Average/AverageReturn -74.2172 | |
| Average/Iteration 1 | |
| Average/MaxReturn -35.2002 | |
| Average/MinReturn -127.671 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 23.4651 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.887116 | |
| TotalEnvSteps 8000 | |
| __unnamed_task__/AverageDiscountedReturn -46.6949 | |
| __unnamed_task__/AverageReturn -74.2172 | |
| __unnamed_task__/Iteration 1 | |
| __unnamed_task__/MaxReturn -35.2002 | |
| __unnamed_task__/MinReturn -127.671 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 23.4651 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.90552 | |
| policy/KL 0.0104231 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.108461 | |
| policy/LossBefore 0.0091655 | |
| policy/dLoss 0.117626 | |
| ---------------------------------------- ------------ | |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing policy... | |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Fitting baseline... | |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing loss before | |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing KL before | |
| 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing | |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing KL after | |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing loss after | |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saving snapshot... | |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saved | |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Time 938.81 s | |
| 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | EpochTime 341.68 s | |
| ---------------------------------------- -------------- | |
| Average/AverageDiscountedReturn -45.4614 | |
| Average/AverageReturn -72.7992 | |
| Average/Iteration 2 | |
| Average/MaxReturn -26.0289 | |
| Average/MinReturn -137.031 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 26.9881 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.840131 | |
| TotalEnvSteps 12000 | |
| __unnamed_task__/AverageDiscountedReturn -45.4614 | |
| __unnamed_task__/AverageReturn -72.7992 | |
| __unnamed_task__/Iteration 2 | |
| __unnamed_task__/MaxReturn -26.0289 | |
| __unnamed_task__/MinReturn -137.031 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 26.9881 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.88918 | |
| policy/KL 0.00923636 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.140978 | |
| policy/LossBefore -0.0310702 | |
| policy/dLoss 0.109907 | |
| ---------------------------------------- -------------- | |
| 2025-03-29 14:44:19 | [rl2_trainer] epoch #3 | Optimizing policy... | |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Fitting baseline... | |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing loss before | |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing KL before | |
| 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Optimizing | |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing KL after | |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing loss after | |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saving snapshot... | |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saved | |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Time 1027.97 s | |
| 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | EpochTime 89.16 s | |
| ---------------------------------------- ------------- | |
| Average/AverageDiscountedReturn -42.7249 | |
| Average/AverageReturn -68.2275 | |
| Average/Iteration 3 | |
| Average/MaxReturn -35.9495 | |
| Average/MinReturn -119.74 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 22.0106 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.895101 | |
| TotalEnvSteps 16000 | |
| __unnamed_task__/AverageDiscountedReturn -42.7249 | |
| __unnamed_task__/AverageReturn -68.2275 | |
| __unnamed_task__/Iteration 3 | |
| __unnamed_task__/MaxReturn -35.9495 | |
| __unnamed_task__/MinReturn -119.74 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 22.0106 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.85707 | |
| policy/KL 0.0100265 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.130342 | |
| policy/LossBefore -0.0353351 | |
| policy/dLoss 0.0950072 | |
| ---------------------------------------- ------------- | |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing policy... | |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Fitting baseline... | |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing loss before | |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing KL before | |
| 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing | |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing KL after | |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing loss after | |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saving snapshot... | |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saved | |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Time 1122.14 s | |
| 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | EpochTime 94.17 s | |
| ---------------------------------------- -------------- | |
| Average/AverageDiscountedReturn -41.9613 | |
| Average/AverageReturn -66.2673 | |
| Average/Iteration 4 | |
| Average/MaxReturn -33.9462 | |
| Average/MinReturn -121.742 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 24.5891 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.909156 | |
| TotalEnvSteps 20000 | |
| __unnamed_task__/AverageDiscountedReturn -41.9613 | |
| __unnamed_task__/AverageReturn -66.2673 | |
| __unnamed_task__/Iteration 4 | |
| __unnamed_task__/MaxReturn -33.9462 | |
| __unnamed_task__/MinReturn -121.742 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 24.5891 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.81839 | |
| policy/KL 0.0102138 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.0962488 | |
| policy/LossBefore 0.00132629 | |
| policy/dLoss 0.0975751 | |
| ---------------------------------------- -------------- | |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing policy... | |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Fitting baseline... | |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing loss before | |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing KL before | |
| 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing | |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing KL after | |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing loss after | |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saving snapshot... | |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saved | |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Time 1251.93 s | |
| 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | EpochTime 129.78 s | |
| ---------------------------------------- ------------- | |
| Average/AverageDiscountedReturn -38.2055 | |
| Average/AverageReturn -61.7326 | |
| Average/Iteration 5 | |
| Average/MaxReturn 134.172 | |
| Average/MinReturn -125.595 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 42.322 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.652002 | |
| TotalEnvSteps 24000 | |
| __unnamed_task__/AverageDiscountedReturn -38.2055 | |
| __unnamed_task__/AverageReturn -61.7326 | |
| __unnamed_task__/Iteration 5 | |
| __unnamed_task__/MaxReturn 134.172 | |
| __unnamed_task__/MinReturn -125.595 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 42.322 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.80804 | |
| policy/KL 0.0122716 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.204539 | |
| policy/LossBefore 0.0500677 | |
| policy/dLoss 0.254606 | |
| ---------------------------------------- ------------- | |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing policy... | |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Fitting baseline... | |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing loss before | |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing KL before | |
| 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing | |
| 2025-03-29 14:51:35 | [rl2_trainer] epoch #6 | Computing KL after | |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Computing loss after | |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saving snapshot... | |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saved | |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Time 1461.75 s | |
| 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | EpochTime 209.82 s | |
| ---------------------------------------- ------------- | |
| Average/AverageDiscountedReturn -42.1921 | |
| Average/AverageReturn -67.1612 | |
| Average/Iteration 6 | |
| Average/MaxReturn -33.1935 | |
| Average/MinReturn -110.057 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 24.1351 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.848234 | |
| TotalEnvSteps 28000 | |
| __unnamed_task__/AverageDiscountedReturn -42.1921 | |
| __unnamed_task__/AverageReturn -67.1612 | |
| __unnamed_task__/Iteration 6 | |
| __unnamed_task__/MaxReturn -33.1935 | |
| __unnamed_task__/MinReturn -110.057 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 24.1351 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.80043 | |
| policy/KL 0.014637 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.114569 | |
| policy/LossBefore -0.0141929 | |
| policy/dLoss 0.100376 | |
| ---------------------------------------- ------------- | |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing policy... | |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Fitting baseline... | |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing loss before | |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing KL before | |
| 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing | |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing KL after | |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing loss after | |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saving snapshot... | |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saved | |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Time 1701.84 s | |
| 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | EpochTime 240.09 s | |
| ---------------------------------------- ------------- | |
| Average/AverageDiscountedReturn -42.4082 | |
| Average/AverageReturn -67.878 | |
| Average/Iteration 7 | |
| Average/MaxReturn -34.1169 | |
| Average/MinReturn -111.115 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 19.5859 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.865991 | |
| TotalEnvSteps 32000 | |
| __unnamed_task__/AverageDiscountedReturn -42.4082 | |
| __unnamed_task__/AverageReturn -67.878 | |
| __unnamed_task__/Iteration 7 | |
| __unnamed_task__/MaxReturn -34.1169 | |
| __unnamed_task__/MinReturn -111.115 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 19.5859 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.79624 | |
| policy/KL 0.0104825 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.13989 | |
| policy/LossBefore -0.0309541 | |
| policy/dLoss 0.108936 | |
| ---------------------------------------- ------------- | |
| 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Optimizing policy... | |
| 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Fitting baseline... | |
| 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Computing loss before | |
| 2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Computing KL before | |
| 2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Optimizing | |
| 2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing KL after | |
| 2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing loss after | |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saving snapshot... | |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saved | |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Time 1945.55 s | |
| 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | EpochTime 243.70 s | |
| ---------------------------------------- ------------- | |
| Average/AverageDiscountedReturn -39.7762 | |
| Average/AverageReturn -63.9139 | |
| Average/Iteration 8 | |
| Average/MaxReturn -35.6858 | |
| Average/MinReturn -110.7 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 20.7657 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.906608 | |
| TotalEnvSteps 36000 | |
| __unnamed_task__/AverageDiscountedReturn -39.7762 | |
| __unnamed_task__/AverageReturn -63.9139 | |
| __unnamed_task__/Iteration 8 | |
| __unnamed_task__/MaxReturn -35.6858 | |
| __unnamed_task__/MinReturn -110.7 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 20.7657 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.78585 | |
| policy/KL 0.0106836 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.0940088 | |
| policy/LossBefore -0.0208258 | |
| policy/dLoss 0.073183 | |
| ---------------------------------------- ------------- | |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Optimizing policy... | |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Fitting baseline... | |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing loss before | |
| 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing KL before | |
| 2025-03-29 15:03:43 | [rl2_trainer] epoch #9 | Optimizing | |
| 2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing KL after | |
| 2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing loss after | |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saving snapshot... | |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saved | |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Time 2197.58 s | |
| 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | EpochTime 252.03 s | |
| ---------------------------------------- -------------- | |
| Average/AverageDiscountedReturn -38.8162 | |
| Average/AverageReturn -61.6066 | |
| Average/Iteration 9 | |
| Average/MaxReturn -11.7124 | |
| Average/MinReturn -113.375 | |
| Average/NumEpisodes 40 | |
| Average/StdReturn 21.625 | |
| Average/TerminationRate 0 | |
| LinearFeatureBaseline/ExplainedVariance 0.827891 | |
| TotalEnvSteps 40000 | |
| __unnamed_task__/AverageDiscountedReturn -38.8162 | |
| __unnamed_task__/AverageReturn -61.6066 | |
| __unnamed_task__/Iteration 9 | |
| __unnamed_task__/MaxReturn -11.7124 | |
| __unnamed_task__/MinReturn -113.375 | |
| __unnamed_task__/NumEpisodes 40 | |
| __unnamed_task__/StdReturn 21.625 | |
| __unnamed_task__/TerminationRate 0 | |
| policy/Entropy 9.77166 | |
| policy/KL 0.00887517 | |
| policy/KLBefore 0 | |
| policy/LossAfter -0.146794 | |
| policy/LossBefore -0.021343 | |
| policy/dLoss 0.125451 | |
| ---------------------------------------- -------------- | |