2025-03-29 14:27:03 | [rl2_trainer] Logging to /home/h2khalil/MetaRL-Assistive-Robotics/data/local/experiment/rl2_trainer 2025-03-29 14:27:14 | [rl2_trainer] Obtaining samples... 2025-03-29 14:31:58 | [rl2_trainer] epoch #0 | Optimizing policy... 2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Fitting baseline... 2025-03-29 14:32:02 | [rl2_trainer] epoch #0 | Computing loss before 2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Computing KL before 2025-03-29 14:32:03 | [rl2_trainer] epoch #0 | Optimizing 2025-03-29 14:32:11 | [rl2_trainer] epoch #0 | Computing KL after 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Computing loss after 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saving snapshot... 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Saved 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | Time 298.18 s 2025-03-29 14:32:12 | [rl2_trainer] epoch #0 | EpochTime 298.18 s ---------------------------------------- ------------ Average/AverageDiscountedReturn -42.9028 Average/AverageReturn -69.0759 Average/Iteration 0 Average/MaxReturn 5.14373 Average/MinReturn -121.89 Average/NumEpisodes 40 Average/StdReturn 26.7746 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.814994 TotalEnvSteps 4000 __unnamed_task__/AverageDiscountedReturn -42.9028 __unnamed_task__/AverageReturn -69.0759 __unnamed_task__/Iteration 0 __unnamed_task__/MaxReturn 5.14373 __unnamed_task__/MinReturn -121.89 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 26.7746 __unnamed_task__/TerminationRate 0 policy/Entropy 9.91254 policy/KL 0.0179773 policy/KLBefore 0 policy/LossAfter -0.172905 policy/LossBefore 0.0100782 policy/dLoss 0.182983 ---------------------------------------- ------------ 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing policy... 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Fitting baseline... 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing loss before 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Computing KL before 2025-03-29 14:36:59 | [rl2_trainer] epoch #1 | Optimizing 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing KL after 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Computing loss after 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saving snapshot... 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Saved 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | Time 597.11 s 2025-03-29 14:37:11 | [rl2_trainer] epoch #1 | EpochTime 298.93 s ---------------------------------------- ------------ Average/AverageDiscountedReturn -46.6949 Average/AverageReturn -74.2172 Average/Iteration 1 Average/MaxReturn -35.2002 Average/MinReturn -127.671 Average/NumEpisodes 40 Average/StdReturn 23.4651 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.887116 TotalEnvSteps 8000 __unnamed_task__/AverageDiscountedReturn -46.6949 __unnamed_task__/AverageReturn -74.2172 __unnamed_task__/Iteration 1 __unnamed_task__/MaxReturn -35.2002 __unnamed_task__/MinReturn -127.671 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 23.4651 __unnamed_task__/TerminationRate 0 policy/Entropy 9.90552 policy/KL 0.0104231 policy/KLBefore 0 policy/LossAfter -0.108461 policy/LossBefore 0.0091655 policy/dLoss 0.117626 ---------------------------------------- ------------ 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing policy... 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Fitting baseline... 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing loss before 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Computing KL before 2025-03-29 14:42:50 | [rl2_trainer] epoch #2 | Optimizing 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing KL after 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Computing loss after 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saving snapshot... 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Saved 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | Time 938.81 s 2025-03-29 14:42:53 | [rl2_trainer] epoch #2 | EpochTime 341.68 s ---------------------------------------- -------------- Average/AverageDiscountedReturn -45.4614 Average/AverageReturn -72.7992 Average/Iteration 2 Average/MaxReturn -26.0289 Average/MinReturn -137.031 Average/NumEpisodes 40 Average/StdReturn 26.9881 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.840131 TotalEnvSteps 12000 __unnamed_task__/AverageDiscountedReturn -45.4614 __unnamed_task__/AverageReturn -72.7992 __unnamed_task__/Iteration 2 __unnamed_task__/MaxReturn -26.0289 __unnamed_task__/MinReturn -137.031 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 26.9881 __unnamed_task__/TerminationRate 0 policy/Entropy 9.88918 policy/KL 0.00923636 policy/KLBefore 0 policy/LossAfter -0.140978 policy/LossBefore -0.0310702 policy/dLoss 0.109907 ---------------------------------------- -------------- 2025-03-29 14:44:19 | [rl2_trainer] epoch #3 | Optimizing policy... 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Fitting baseline... 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing loss before 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Computing KL before 2025-03-29 14:44:20 | [rl2_trainer] epoch #3 | Optimizing 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing KL after 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Computing loss after 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saving snapshot... 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Saved 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | Time 1027.97 s 2025-03-29 14:44:22 | [rl2_trainer] epoch #3 | EpochTime 89.16 s ---------------------------------------- ------------- Average/AverageDiscountedReturn -42.7249 Average/AverageReturn -68.2275 Average/Iteration 3 Average/MaxReturn -35.9495 Average/MinReturn -119.74 Average/NumEpisodes 40 Average/StdReturn 22.0106 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.895101 TotalEnvSteps 16000 __unnamed_task__/AverageDiscountedReturn -42.7249 __unnamed_task__/AverageReturn -68.2275 __unnamed_task__/Iteration 3 __unnamed_task__/MaxReturn -35.9495 __unnamed_task__/MinReturn -119.74 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 22.0106 __unnamed_task__/TerminationRate 0 policy/Entropy 9.85707 policy/KL 0.0100265 policy/KLBefore 0 policy/LossAfter -0.130342 policy/LossBefore -0.0353351 policy/dLoss 0.0950072 ---------------------------------------- ------------- 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing policy... 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Fitting baseline... 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing loss before 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Computing KL before 2025-03-29 14:45:54 | [rl2_trainer] epoch #4 | Optimizing 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing KL after 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Computing loss after 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saving snapshot... 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Saved 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | Time 1122.14 s 2025-03-29 14:45:56 | [rl2_trainer] epoch #4 | EpochTime 94.17 s ---------------------------------------- -------------- Average/AverageDiscountedReturn -41.9613 Average/AverageReturn -66.2673 Average/Iteration 4 Average/MaxReturn -33.9462 Average/MinReturn -121.742 Average/NumEpisodes 40 Average/StdReturn 24.5891 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.909156 TotalEnvSteps 20000 __unnamed_task__/AverageDiscountedReturn -41.9613 __unnamed_task__/AverageReturn -66.2673 __unnamed_task__/Iteration 4 __unnamed_task__/MaxReturn -33.9462 __unnamed_task__/MinReturn -121.742 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 24.5891 __unnamed_task__/TerminationRate 0 policy/Entropy 9.81839 policy/KL 0.0102138 policy/KLBefore 0 policy/LossAfter -0.0962488 policy/LossBefore 0.00132629 policy/dLoss 0.0975751 ---------------------------------------- -------------- 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing policy... 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Fitting baseline... 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing loss before 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Computing KL before 2025-03-29 14:48:00 | [rl2_trainer] epoch #5 | Optimizing 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing KL after 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Computing loss after 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saving snapshot... 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Saved 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | Time 1251.93 s 2025-03-29 14:48:06 | [rl2_trainer] epoch #5 | EpochTime 129.78 s ---------------------------------------- ------------- Average/AverageDiscountedReturn -38.2055 Average/AverageReturn -61.7326 Average/Iteration 5 Average/MaxReturn 134.172 Average/MinReturn -125.595 Average/NumEpisodes 40 Average/StdReturn 42.322 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.652002 TotalEnvSteps 24000 __unnamed_task__/AverageDiscountedReturn -38.2055 __unnamed_task__/AverageReturn -61.7326 __unnamed_task__/Iteration 5 __unnamed_task__/MaxReturn 134.172 __unnamed_task__/MinReturn -125.595 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 42.322 __unnamed_task__/TerminationRate 0 policy/Entropy 9.80804 policy/KL 0.0122716 policy/KLBefore 0 policy/LossAfter -0.204539 policy/LossBefore 0.0500677 policy/dLoss 0.254606 ---------------------------------------- ------------- 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing policy... 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Fitting baseline... 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing loss before 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Computing KL before 2025-03-29 14:51:28 | [rl2_trainer] epoch #6 | Optimizing 2025-03-29 14:51:35 | [rl2_trainer] epoch #6 | Computing KL after 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Computing loss after 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saving snapshot... 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Saved 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | Time 1461.75 s 2025-03-29 14:51:36 | [rl2_trainer] epoch #6 | EpochTime 209.82 s ---------------------------------------- ------------- Average/AverageDiscountedReturn -42.1921 Average/AverageReturn -67.1612 Average/Iteration 6 Average/MaxReturn -33.1935 Average/MinReturn -110.057 Average/NumEpisodes 40 Average/StdReturn 24.1351 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.848234 TotalEnvSteps 28000 __unnamed_task__/AverageDiscountedReturn -42.1921 __unnamed_task__/AverageReturn -67.1612 __unnamed_task__/Iteration 6 __unnamed_task__/MaxReturn -33.1935 __unnamed_task__/MinReturn -110.057 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 24.1351 __unnamed_task__/TerminationRate 0 policy/Entropy 9.80043 policy/KL 0.014637 policy/KLBefore 0 policy/LossAfter -0.114569 policy/LossBefore -0.0141929 policy/dLoss 0.100376 ---------------------------------------- ------------- 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing policy... 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Fitting baseline... 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing loss before 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Computing KL before 2025-03-29 14:55:29 | [rl2_trainer] epoch #7 | Optimizing 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing KL after 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Computing loss after 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saving snapshot... 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Saved 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | Time 1701.84 s 2025-03-29 14:55:36 | [rl2_trainer] epoch #7 | EpochTime 240.09 s ---------------------------------------- ------------- Average/AverageDiscountedReturn -42.4082 Average/AverageReturn -67.878 Average/Iteration 7 Average/MaxReturn -34.1169 Average/MinReturn -111.115 Average/NumEpisodes 40 Average/StdReturn 19.5859 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.865991 TotalEnvSteps 32000 __unnamed_task__/AverageDiscountedReturn -42.4082 __unnamed_task__/AverageReturn -67.878 __unnamed_task__/Iteration 7 __unnamed_task__/MaxReturn -34.1169 __unnamed_task__/MinReturn -111.115 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 19.5859 __unnamed_task__/TerminationRate 0 policy/Entropy 9.79624 policy/KL 0.0104825 policy/KLBefore 0 policy/LossAfter -0.13989 policy/LossBefore -0.0309541 policy/dLoss 0.108936 ---------------------------------------- ------------- 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Optimizing policy... 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Fitting baseline... 2025-03-29 14:59:31 | [rl2_trainer] epoch #8 | Computing loss before 2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Computing KL before 2025-03-29 14:59:32 | [rl2_trainer] epoch #8 | Optimizing 2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing KL after 2025-03-29 14:59:39 | [rl2_trainer] epoch #8 | Computing loss after 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saving snapshot... 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Saved 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | Time 1945.55 s 2025-03-29 14:59:40 | [rl2_trainer] epoch #8 | EpochTime 243.70 s ---------------------------------------- ------------- Average/AverageDiscountedReturn -39.7762 Average/AverageReturn -63.9139 Average/Iteration 8 Average/MaxReturn -35.6858 Average/MinReturn -110.7 Average/NumEpisodes 40 Average/StdReturn 20.7657 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.906608 TotalEnvSteps 36000 __unnamed_task__/AverageDiscountedReturn -39.7762 __unnamed_task__/AverageReturn -63.9139 __unnamed_task__/Iteration 8 __unnamed_task__/MaxReturn -35.6858 __unnamed_task__/MinReturn -110.7 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 20.7657 __unnamed_task__/TerminationRate 0 policy/Entropy 9.78585 policy/KL 0.0106836 policy/KLBefore 0 policy/LossAfter -0.0940088 policy/LossBefore -0.0208258 policy/dLoss 0.073183 ---------------------------------------- ------------- 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Optimizing policy... 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Fitting baseline... 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing loss before 2025-03-29 15:03:42 | [rl2_trainer] epoch #9 | Computing KL before 2025-03-29 15:03:43 | [rl2_trainer] epoch #9 | Optimizing 2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing KL after 2025-03-29 15:03:51 | [rl2_trainer] epoch #9 | Computing loss after 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saving snapshot... 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Saved 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | Time 2197.58 s 2025-03-29 15:03:52 | [rl2_trainer] epoch #9 | EpochTime 252.03 s ---------------------------------------- -------------- Average/AverageDiscountedReturn -38.8162 Average/AverageReturn -61.6066 Average/Iteration 9 Average/MaxReturn -11.7124 Average/MinReturn -113.375 Average/NumEpisodes 40 Average/StdReturn 21.625 Average/TerminationRate 0 LinearFeatureBaseline/ExplainedVariance 0.827891 TotalEnvSteps 40000 __unnamed_task__/AverageDiscountedReturn -38.8162 __unnamed_task__/AverageReturn -61.6066 __unnamed_task__/Iteration 9 __unnamed_task__/MaxReturn -11.7124 __unnamed_task__/MinReturn -113.375 __unnamed_task__/NumEpisodes 40 __unnamed_task__/StdReturn 21.625 __unnamed_task__/TerminationRate 0 policy/Entropy 9.77166 policy/KL 0.00887517 policy/KLBefore 0 policy/LossAfter -0.146794 policy/LossBefore -0.021343 policy/dLoss 0.125451 ---------------------------------------- --------------