[2023-09-30 11:54:20,812][117662] Saving configuration to ./train_atari/atari_kangaroo/config.json... [2023-09-30 11:54:21,129][117662] Rollout worker 0 uses device cpu [2023-09-30 11:54:21,130][117662] Rollout worker 1 uses device cpu [2023-09-30 11:54:21,130][117662] Rollout worker 2 uses device cpu [2023-09-30 11:54:21,131][117662] Rollout worker 3 uses device cpu [2023-09-30 11:54:21,131][117662] Rollout worker 4 uses device cpu [2023-09-30 11:54:21,132][117662] Rollout worker 5 uses device cpu [2023-09-30 11:54:21,132][117662] Rollout worker 6 uses device cpu [2023-09-30 11:54:21,133][117662] Rollout worker 7 uses device cpu [2023-09-30 11:54:21,133][117662] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-30 11:54:21,179][117662] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-30 11:54:21,179][117662] InferenceWorker_p0-w0: min num requests: 1 [2023-09-30 11:54:21,182][117662] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-30 11:54:21,183][117662] InferenceWorker_p1-w0: min num requests: 1 [2023-09-30 11:54:21,205][117662] Starting all processes... [2023-09-30 11:54:21,206][117662] Starting process learner_proc0 [2023-09-30 11:54:22,835][117662] Starting process learner_proc1 [2023-09-30 11:54:22,838][118358] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-30 11:54:22,839][118358] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-30 11:54:22,856][118358] Num visible devices: 1 [2023-09-30 11:54:22,872][118358] Starting seed is not provided [2023-09-30 11:54:22,872][118358] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-30 11:54:22,872][118358] Initializing actor-critic model on device cuda:0 [2023-09-30 11:54:22,873][118358] RunningMeanStd input shape: (4, 84, 84) [2023-09-30 11:54:22,873][118358] RunningMeanStd input shape: (1,) [2023-09-30 11:54:22,884][118358] ConvEncoder: input_channels=4 [2023-09-30 11:54:23,037][118358] Conv encoder output size: 512 [2023-09-30 11:54:23,039][118358] Created Actor Critic model with architecture: [2023-09-30 11:54:23,039][118358] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-30 11:54:23,627][118358] Using optimizer [2023-09-30 11:54:23,627][118358] No checkpoints found [2023-09-30 11:54:23,627][118358] Did not load from checkpoint, starting from scratch! [2023-09-30 11:54:23,628][118358] Initialized policy 0 weights for model version 0 [2023-09-30 11:54:23,629][118358] LearnerWorker_p0 finished initialization! [2023-09-30 11:54:23,629][118358] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-30 11:54:24,467][117662] Starting all processes... [2023-09-30 11:54:24,471][118438] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-30 11:54:24,471][118438] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-09-30 11:54:24,475][117662] Starting process inference_proc0-0 [2023-09-30 11:54:24,475][117662] Starting process inference_proc1-0 [2023-09-30 11:54:24,475][117662] Starting process rollout_proc0 [2023-09-30 11:54:24,475][117662] Starting process rollout_proc1 [2023-09-30 11:54:24,490][118438] Num visible devices: 1 [2023-09-30 11:54:24,476][117662] Starting process rollout_proc2 [2023-09-30 11:54:24,476][117662] Starting process rollout_proc3 [2023-09-30 11:54:24,476][117662] Starting process rollout_proc4 [2023-09-30 11:54:24,511][118438] Starting seed is not provided [2023-09-30 11:54:24,511][118438] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-30 11:54:24,511][118438] Initializing actor-critic model on device cuda:0 [2023-09-30 11:54:24,511][118438] RunningMeanStd input shape: (4, 84, 84) [2023-09-30 11:54:24,512][118438] RunningMeanStd input shape: (1,) [2023-09-30 11:54:24,477][117662] Starting process rollout_proc5 [2023-09-30 11:54:24,480][117662] Starting process rollout_proc6 [2023-09-30 11:54:24,481][117662] Starting process rollout_proc7 [2023-09-30 11:54:24,524][118438] ConvEncoder: input_channels=4 [2023-09-30 11:54:24,848][118438] Conv encoder output size: 512 [2023-09-30 11:54:24,850][118438] Created Actor Critic model with architecture: [2023-09-30 11:54:24,850][118438] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-30 11:54:25,470][118438] Using optimizer [2023-09-30 11:54:25,471][118438] No checkpoints found [2023-09-30 11:54:25,471][118438] Did not load from checkpoint, starting from scratch! [2023-09-30 11:54:25,471][118438] Initialized policy 1 weights for model version 0 [2023-09-30 11:54:25,472][118438] LearnerWorker_p1 finished initialization! [2023-09-30 11:54:25,473][118438] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-30 11:54:26,449][118534] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-30 11:54:26,452][118569] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-30 11:54:26,454][118566] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-30 11:54:26,485][118532] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-30 11:54:26,485][118532] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-30 11:54:26,498][118532] Num visible devices: 1 [2023-09-30 11:54:26,507][118567] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-30 11:54:26,510][118571] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-30 11:54:26,517][118531] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-30 11:54:26,517][118531] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-09-30 11:54:26,528][118572] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-30 11:54:26,532][118570] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-30 11:54:26,550][118573] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-30 11:54:26,560][118531] Num visible devices: 1 [2023-09-30 11:54:26,947][117662] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan, 1: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-30 11:54:27,169][118532] RunningMeanStd input shape: (4, 84, 84) [2023-09-30 11:54:27,169][118532] RunningMeanStd input shape: (1,) [2023-09-30 11:54:27,171][118531] RunningMeanStd input shape: (4, 84, 84) [2023-09-30 11:54:27,172][118531] RunningMeanStd input shape: (1,) [2023-09-30 11:54:27,180][118532] ConvEncoder: input_channels=4 [2023-09-30 11:54:27,183][118531] ConvEncoder: input_channels=4 [2023-09-30 11:54:27,283][118532] Conv encoder output size: 512 [2023-09-30 11:54:27,289][117662] Inference worker 0-0 is ready! [2023-09-30 11:54:27,296][118531] Conv encoder output size: 512 [2023-09-30 11:54:27,302][117662] Inference worker 1-0 is ready! [2023-09-30 11:54:27,302][117662] All inference workers are ready! Signal rollout workers to start! [2023-09-30 11:54:27,743][118570] Decorrelating experience for 0 frames... [2023-09-30 11:54:27,743][118534] Decorrelating experience for 0 frames... [2023-09-30 11:54:27,753][118567] Decorrelating experience for 0 frames... [2023-09-30 11:54:27,758][118572] Decorrelating experience for 0 frames... [2023-09-30 11:54:27,760][118571] Decorrelating experience for 0 frames... [2023-09-30 11:54:27,761][118566] Decorrelating experience for 0 frames... [2023-09-30 11:54:27,808][118573] Decorrelating experience for 0 frames... [2023-09-30 11:54:27,876][118569] Decorrelating experience for 0 frames... [2023-09-30 11:54:31,947][117662] Fps is (10 sec: 1638.3, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 8192. Throughput: 0: 204.8, 1: 204.8. Samples: 2048. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 11:54:31,948][117662] Avg episode reward: [(0, '0.000'), (1, '0.000')] [2023-09-30 11:54:36,949][117662] Fps is (10 sec: 3276.2, 60 sec: 3276.2, 300 sec: 3276.2). Total num frames: 32768. Throughput: 0: 372.0, 1: 376.4. Samples: 7486. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 11:54:36,950][117662] Avg episode reward: [(0, '0.091'), (1, '0.042')] [2023-09-30 11:54:41,167][117662] Heartbeat connected on Batcher_0 [2023-09-30 11:54:41,170][117662] Heartbeat connected on LearnerWorker_p0 [2023-09-30 11:54:41,173][117662] Heartbeat connected on Batcher_1 [2023-09-30 11:54:41,175][117662] Heartbeat connected on LearnerWorker_p1 [2023-09-30 11:54:41,181][117662] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-30 11:54:41,185][117662] Heartbeat connected on InferenceWorker_p1-w0 [2023-09-30 11:54:41,186][117662] Heartbeat connected on RolloutWorker_w0 [2023-09-30 11:54:41,189][117662] Heartbeat connected on RolloutWorker_w1 [2023-09-30 11:54:41,191][117662] Heartbeat connected on RolloutWorker_w2 [2023-09-30 11:54:41,194][117662] Heartbeat connected on RolloutWorker_w3 [2023-09-30 11:54:41,198][117662] Heartbeat connected on RolloutWorker_w4 [2023-09-30 11:54:41,200][117662] Heartbeat connected on RolloutWorker_w5 [2023-09-30 11:54:41,203][117662] Heartbeat connected on RolloutWorker_w6 [2023-09-30 11:54:41,205][117662] Heartbeat connected on RolloutWorker_w7 [2023-09-30 11:54:41,947][117662] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3822.9). Total num frames: 57344. Throughput: 0: 407.8, 1: 409.1. Samples: 12253. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 11:54:41,948][117662] Avg episode reward: [(0, '0.057'), (1, '0.038')] [2023-09-30 11:54:44,738][118532] Updated weights for policy 0, policy_version 160 (0.0018) [2023-09-30 11:54:44,738][118531] Updated weights for policy 1, policy_version 160 (0.0016) [2023-09-30 11:54:46,947][117662] Fps is (10 sec: 5735.5, 60 sec: 4505.6, 300 sec: 4505.6). Total num frames: 90112. Throughput: 0: 539.0, 1: 540.5. Samples: 21591. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 11:54:46,948][117662] Avg episode reward: [(0, '0.052'), (1, '0.038')] [2023-09-30 11:54:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 122880. Throughput: 0: 614.4, 1: 614.4. Samples: 30720. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:54:51,948][117662] Avg episode reward: [(0, '0.041'), (1, '0.040')] [2023-09-30 11:54:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 5188.3, 300 sec: 5188.3). Total num frames: 155648. Throughput: 0: 583.0, 1: 583.6. Samples: 35000. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:54:56,948][117662] Avg episode reward: [(0, '0.040'), (1, '0.040')] [2023-09-30 11:54:58,122][118532] Updated weights for policy 0, policy_version 320 (0.0018) [2023-09-30 11:54:58,123][118531] Updated weights for policy 1, policy_version 320 (0.0018) [2023-09-30 11:55:01,947][117662] Fps is (10 sec: 5734.5, 60 sec: 5149.3, 300 sec: 5149.3). Total num frames: 180224. Throughput: 0: 638.5, 1: 638.7. Samples: 44703. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:55:01,948][117662] Avg episode reward: [(0, '0.050'), (1, '0.050')] [2023-09-30 11:55:06,947][117662] Fps is (10 sec: 5734.5, 60 sec: 5324.8, 300 sec: 5324.8). Total num frames: 212992. Throughput: 0: 673.6, 1: 673.9. Samples: 53898. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:55:06,948][117662] Avg episode reward: [(0, '0.040'), (1, '0.060')] [2023-09-30 11:55:06,948][118358] Saving new best policy, reward=0.040! [2023-09-30 11:55:06,948][118438] Saving new best policy, reward=0.060! [2023-09-30 11:55:11,316][118531] Updated weights for policy 1, policy_version 480 (0.0019) [2023-09-30 11:55:11,316][118532] Updated weights for policy 0, policy_version 480 (0.0017) [2023-09-30 11:55:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 5461.4, 300 sec: 5461.4). Total num frames: 245760. Throughput: 0: 652.2, 1: 652.3. Samples: 58699. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 11:55:11,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.060')] [2023-09-30 11:55:11,948][118358] Saving new best policy, reward=0.050! [2023-09-30 11:55:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 5570.6, 300 sec: 5570.6). Total num frames: 278528. Throughput: 0: 728.7, 1: 729.0. Samples: 67645. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 11:55:16,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.100')] [2023-09-30 11:55:16,954][118358] Saving new best policy, reward=0.070! [2023-09-30 11:55:16,954][118438] Saving new best policy, reward=0.100! [2023-09-30 11:55:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 5660.0, 300 sec: 5660.0). Total num frames: 311296. Throughput: 0: 775.5, 1: 775.7. Samples: 77288. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:55:21,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.080')] [2023-09-30 11:55:24,388][118531] Updated weights for policy 1, policy_version 640 (0.0019) [2023-09-30 11:55:24,388][118532] Updated weights for policy 0, policy_version 640 (0.0017) [2023-09-30 11:55:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 5734.4, 300 sec: 5734.4). Total num frames: 344064. Throughput: 0: 774.3, 1: 773.9. Samples: 81920. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:55:26,947][117662] Avg episode reward: [(0, '0.060'), (1, '0.070')] [2023-09-30 11:55:31,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5671.4). Total num frames: 368640. Throughput: 0: 777.7, 1: 777.6. Samples: 91578. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:55:31,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.060')] [2023-09-30 11:55:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.2, 300 sec: 5734.4). Total num frames: 401408. Throughput: 0: 777.8, 1: 777.7. Samples: 100720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 11:55:36,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.090')] [2023-09-30 11:55:37,393][118532] Updated weights for policy 0, policy_version 800 (0.0017) [2023-09-30 11:55:37,393][118531] Updated weights for policy 1, policy_version 800 (0.0017) [2023-09-30 11:55:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 5789.0). Total num frames: 434176. Throughput: 0: 784.1, 1: 784.9. Samples: 105604. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 11:55:41,948][117662] Avg episode reward: [(0, '0.060'), (1, '0.090')] [2023-09-30 11:55:46,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 5836.8). Total num frames: 466944. Throughput: 0: 779.9, 1: 779.5. Samples: 114877. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 11:55:46,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.100')] [2023-09-30 11:55:50,495][118532] Updated weights for policy 0, policy_version 960 (0.0017) [2023-09-30 11:55:50,496][118531] Updated weights for policy 1, policy_version 960 (0.0017) [2023-09-30 11:55:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 5879.0). Total num frames: 499712. Throughput: 0: 783.5, 1: 783.4. Samples: 124406. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 11:55:51,948][117662] Avg episode reward: [(0, '0.090'), (1, '0.120')] [2023-09-30 11:55:51,949][118358] Saving new best policy, reward=0.090! [2023-09-30 11:55:51,949][118438] Saving new best policy, reward=0.120! [2023-09-30 11:55:56,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 5825.4). Total num frames: 524288. Throughput: 0: 781.4, 1: 781.3. Samples: 129024. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 11:55:56,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.090')] [2023-09-30 11:56:01,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 5863.7). Total num frames: 557056. Throughput: 0: 787.4, 1: 787.2. Samples: 138501. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:56:01,948][117662] Avg episode reward: [(0, '0.040'), (1, '0.120')] [2023-09-30 11:56:03,532][118531] Updated weights for policy 1, policy_version 1120 (0.0017) [2023-09-30 11:56:03,532][118532] Updated weights for policy 0, policy_version 1120 (0.0018) [2023-09-30 11:56:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 5898.2). Total num frames: 589824. Throughput: 0: 781.8, 1: 780.9. Samples: 147611. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:56:06,948][117662] Avg episode reward: [(0, '0.050'), (1, '0.150')] [2023-09-30 11:56:06,949][118438] Saving new best policy, reward=0.150! [2023-09-30 11:56:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 5929.5). Total num frames: 622592. Throughput: 0: 781.4, 1: 782.0. Samples: 152270. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:56:11,948][117662] Avg episode reward: [(0, '0.050'), (1, '0.120')] [2023-09-30 11:56:16,788][118532] Updated weights for policy 0, policy_version 1280 (0.0017) [2023-09-30 11:56:16,788][118531] Updated weights for policy 1, policy_version 1280 (0.0020) [2023-09-30 11:56:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 5957.8). Total num frames: 655360. Throughput: 0: 780.4, 1: 779.9. Samples: 161792. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 11:56:16,948][117662] Avg episode reward: [(0, '0.030'), (1, '0.150')] [2023-09-30 11:56:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000001280_327680.pth... [2023-09-30 11:56:16,959][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000001280_327680.pth... [2023-09-30 11:56:21,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 5912.5). Total num frames: 679936. Throughput: 0: 779.5, 1: 779.6. Samples: 170882. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 11:56:21,948][117662] Avg episode reward: [(0, '0.040'), (1, '0.100')] [2023-09-30 11:56:26,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 5939.2). Total num frames: 712704. Throughput: 0: 779.3, 1: 777.4. Samples: 175657. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 11:56:26,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.060')] [2023-09-30 11:56:30,018][118532] Updated weights for policy 0, policy_version 1440 (0.0017) [2023-09-30 11:56:30,018][118531] Updated weights for policy 1, policy_version 1440 (0.0018) [2023-09-30 11:56:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 5963.8). Total num frames: 745472. Throughput: 0: 777.2, 1: 777.6. Samples: 184843. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 11:56:31,948][117662] Avg episode reward: [(0, '0.050'), (1, '0.080')] [2023-09-30 11:56:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 5986.5). Total num frames: 778240. Throughput: 0: 778.6, 1: 778.9. Samples: 194494. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:56:36,947][117662] Avg episode reward: [(0, '0.040'), (1, '0.100')] [2023-09-30 11:56:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6007.5). Total num frames: 811008. Throughput: 0: 776.4, 1: 776.6. Samples: 198906. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:56:41,948][117662] Avg episode reward: [(0, '0.050'), (1, '0.110')] [2023-09-30 11:56:43,052][118532] Updated weights for policy 0, policy_version 1600 (0.0017) [2023-09-30 11:56:43,052][118531] Updated weights for policy 1, policy_version 1600 (0.0014) [2023-09-30 11:56:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 5968.5). Total num frames: 835584. Throughput: 0: 781.1, 1: 779.6. Samples: 208732. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 11:56:46,948][117662] Avg episode reward: [(0, '0.060'), (1, '0.140')] [2023-09-30 11:56:51,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 5988.6). Total num frames: 868352. Throughput: 0: 779.1, 1: 778.9. Samples: 217720. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:56:51,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.110')] [2023-09-30 11:56:56,098][118532] Updated weights for policy 0, policy_version 1760 (0.0018) [2023-09-30 11:56:56,098][118531] Updated weights for policy 1, policy_version 1760 (0.0017) [2023-09-30 11:56:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6007.5). Total num frames: 901120. Throughput: 0: 782.0, 1: 781.8. Samples: 222641. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 11:56:56,948][117662] Avg episode reward: [(0, '0.040'), (1, '0.110')] [2023-09-30 11:57:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6025.1). Total num frames: 933888. Throughput: 0: 777.5, 1: 777.6. Samples: 231772. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:57:01,947][117662] Avg episode reward: [(0, '0.030'), (1, '0.120')] [2023-09-30 11:57:06,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6041.6). Total num frames: 966656. Throughput: 0: 781.4, 1: 781.8. Samples: 241225. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 11:57:06,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.100')] [2023-09-30 11:57:09,404][118532] Updated weights for policy 0, policy_version 1920 (0.0017) [2023-09-30 11:57:09,404][118531] Updated weights for policy 1, policy_version 1920 (0.0018) [2023-09-30 11:57:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6057.1). Total num frames: 999424. Throughput: 0: 778.6, 1: 779.2. Samples: 245760. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:57:11,947][117662] Avg episode reward: [(0, '0.030'), (1, '0.110')] [2023-09-30 11:57:16,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6023.5). Total num frames: 1024000. Throughput: 0: 782.5, 1: 782.8. Samples: 255284. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 11:57:16,948][117662] Avg episode reward: [(0, '0.030'), (1, '0.100')] [2023-09-30 11:57:21,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6038.7). Total num frames: 1056768. Throughput: 0: 776.3, 1: 776.0. Samples: 264344. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 11:57:21,947][117662] Avg episode reward: [(0, '0.040'), (1, '0.080')] [2023-09-30 11:57:22,521][118532] Updated weights for policy 0, policy_version 2080 (0.0018) [2023-09-30 11:57:22,521][118531] Updated weights for policy 1, policy_version 2080 (0.0017) [2023-09-30 11:57:26,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6053.0). Total num frames: 1089536. Throughput: 0: 781.0, 1: 781.7. Samples: 269226. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:57:26,947][117662] Avg episode reward: [(0, '0.040'), (1, '0.110')] [2023-09-30 11:57:31,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6066.5). Total num frames: 1122304. Throughput: 0: 775.3, 1: 776.8. Samples: 278577. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 11:57:31,947][117662] Avg episode reward: [(0, '0.030'), (1, '0.110')] [2023-09-30 11:57:35,453][118532] Updated weights for policy 0, policy_version 2240 (0.0018) [2023-09-30 11:57:35,454][118531] Updated weights for policy 1, policy_version 2240 (0.0018) [2023-09-30 11:57:36,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6079.3). Total num frames: 1155072. Throughput: 0: 783.1, 1: 784.8. Samples: 288272. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:57:36,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.120')] [2023-09-30 11:57:41,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6049.5). Total num frames: 1179648. Throughput: 0: 779.8, 1: 779.8. Samples: 292820. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 11:57:41,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.090')] [2023-09-30 11:57:46,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6062.1). Total num frames: 1212416. Throughput: 0: 778.0, 1: 778.6. Samples: 301823. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:57:46,947][117662] Avg episode reward: [(0, '0.060'), (1, '0.120')] [2023-09-30 11:57:48,853][118532] Updated weights for policy 0, policy_version 2400 (0.0019) [2023-09-30 11:57:48,854][118531] Updated weights for policy 1, policy_version 2400 (0.0018) [2023-09-30 11:57:51,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6074.1). Total num frames: 1245184. Throughput: 0: 778.8, 1: 778.4. Samples: 311296. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 11:57:51,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.130')] [2023-09-30 11:57:56,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6085.5). Total num frames: 1277952. Throughput: 0: 779.0, 1: 779.4. Samples: 315888. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 11:57:56,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.130')] [2023-09-30 11:58:01,835][118532] Updated weights for policy 0, policy_version 2560 (0.0018) [2023-09-30 11:58:01,835][118531] Updated weights for policy 1, policy_version 2560 (0.0017) [2023-09-30 11:58:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6096.4). Total num frames: 1310720. Throughput: 0: 781.1, 1: 781.2. Samples: 325587. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:01,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.110')] [2023-09-30 11:58:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6069.5). Total num frames: 1335296. Throughput: 0: 784.1, 1: 783.7. Samples: 334892. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:06,948][117662] Avg episode reward: [(0, '0.130'), (1, '0.070')] [2023-09-30 11:58:07,004][118358] Saving new best policy, reward=0.130! [2023-09-30 11:58:11,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6080.3). Total num frames: 1368064. Throughput: 0: 782.4, 1: 781.9. Samples: 339619. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 11:58:11,947][117662] Avg episode reward: [(0, '0.150'), (1, '0.040')] [2023-09-30 11:58:11,948][118358] Saving new best policy, reward=0.150! [2023-09-30 11:58:14,890][118532] Updated weights for policy 0, policy_version 2720 (0.0016) [2023-09-30 11:58:14,890][118531] Updated weights for policy 1, policy_version 2720 (0.0018) [2023-09-30 11:58:16,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6090.6). Total num frames: 1400832. Throughput: 0: 782.2, 1: 781.6. Samples: 348951. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:16,947][117662] Avg episode reward: [(0, '0.130'), (1, '0.050')] [2023-09-30 11:58:16,957][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000002736_700416.pth... [2023-09-30 11:58:16,958][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000002736_700416.pth... [2023-09-30 11:58:21,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6100.4). Total num frames: 1433600. Throughput: 0: 780.1, 1: 778.3. Samples: 358400. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:21,948][117662] Avg episode reward: [(0, '0.120'), (1, '0.060')] [2023-09-30 11:58:26,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6109.9). Total num frames: 1466368. Throughput: 0: 777.1, 1: 776.9. Samples: 362751. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:26,948][117662] Avg episode reward: [(0, '0.080'), (1, '0.070')] [2023-09-30 11:58:28,065][118532] Updated weights for policy 0, policy_version 2880 (0.0017) [2023-09-30 11:58:28,065][118531] Updated weights for policy 1, policy_version 2880 (0.0018) [2023-09-30 11:58:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6085.5). Total num frames: 1490944. Throughput: 0: 785.2, 1: 785.3. Samples: 372497. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:31,948][117662] Avg episode reward: [(0, '0.050'), (1, '0.080')] [2023-09-30 11:58:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6094.8). Total num frames: 1523712. Throughput: 0: 780.3, 1: 780.6. Samples: 381536. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:36,948][117662] Avg episode reward: [(0, '0.080'), (1, '0.110')] [2023-09-30 11:58:41,244][118532] Updated weights for policy 0, policy_version 3040 (0.0018) [2023-09-30 11:58:41,245][118531] Updated weights for policy 1, policy_version 3040 (0.0016) [2023-09-30 11:58:41,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6103.8). Total num frames: 1556480. Throughput: 0: 782.6, 1: 782.4. Samples: 386312. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:41,948][117662] Avg episode reward: [(0, '0.100'), (1, '0.130')] [2023-09-30 11:58:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6112.5). Total num frames: 1589248. Throughput: 0: 774.5, 1: 773.9. Samples: 395264. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:46,948][117662] Avg episode reward: [(0, '0.100'), (1, '0.130')] [2023-09-30 11:58:51,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6089.9). Total num frames: 1613824. Throughput: 0: 768.9, 1: 769.7. Samples: 404126. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 11:58:51,947][117662] Avg episode reward: [(0, '0.110'), (1, '0.140')] [2023-09-30 11:58:54,972][118532] Updated weights for policy 0, policy_version 3200 (0.0015) [2023-09-30 11:58:54,974][118531] Updated weights for policy 1, policy_version 3200 (0.0018) [2023-09-30 11:58:56,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6098.5). Total num frames: 1646592. Throughput: 0: 768.7, 1: 770.3. Samples: 408877. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:58:56,948][117662] Avg episode reward: [(0, '0.130'), (1, '0.140')] [2023-09-30 11:59:01,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 6106.8). Total num frames: 1679360. Throughput: 0: 766.4, 1: 767.1. Samples: 417957. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:01,948][117662] Avg episode reward: [(0, '0.130'), (1, '0.120')] [2023-09-30 11:59:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6114.7). Total num frames: 1712128. Throughput: 0: 770.7, 1: 770.7. Samples: 427761. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:06,948][117662] Avg episode reward: [(0, '0.120'), (1, '0.130')] [2023-09-30 11:59:08,017][118532] Updated weights for policy 0, policy_version 3360 (0.0017) [2023-09-30 11:59:08,020][118531] Updated weights for policy 1, policy_version 3360 (0.0019) [2023-09-30 11:59:11,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6093.7). Total num frames: 1736704. Throughput: 0: 771.0, 1: 770.7. Samples: 432128. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:11,948][117662] Avg episode reward: [(0, '0.110'), (1, '0.120')] [2023-09-30 11:59:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6101.6). Total num frames: 1769472. Throughput: 0: 767.5, 1: 767.3. Samples: 441565. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:16,948][117662] Avg episode reward: [(0, '0.100'), (1, '0.090')] [2023-09-30 11:59:21,253][118531] Updated weights for policy 1, policy_version 3520 (0.0019) [2023-09-30 11:59:21,253][118532] Updated weights for policy 0, policy_version 3520 (0.0018) [2023-09-30 11:59:21,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 6109.3). Total num frames: 1802240. Throughput: 0: 767.8, 1: 767.7. Samples: 450632. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 11:59:21,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.050')] [2023-09-30 11:59:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 1835008. Throughput: 0: 768.6, 1: 769.6. Samples: 455532. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:26,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.030')] [2023-09-30 11:59:31,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 1867776. Throughput: 0: 773.7, 1: 773.7. Samples: 464896. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:31,947][117662] Avg episode reward: [(0, '0.030'), (1, '0.050')] [2023-09-30 11:59:34,403][118532] Updated weights for policy 0, policy_version 3680 (0.0019) [2023-09-30 11:59:34,403][118531] Updated weights for policy 1, policy_version 3680 (0.0016) [2023-09-30 11:59:36,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 1892352. Throughput: 0: 779.4, 1: 778.8. Samples: 474247. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:36,948][117662] Avg episode reward: [(0, '0.030'), (1, '0.080')] [2023-09-30 11:59:41,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 1925120. Throughput: 0: 778.9, 1: 776.8. Samples: 478884. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:41,947][117662] Avg episode reward: [(0, '0.030'), (1, '0.110')] [2023-09-30 11:59:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 1957888. Throughput: 0: 775.4, 1: 775.2. Samples: 487734. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 11:59:46,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.110')] [2023-09-30 11:59:47,744][118531] Updated weights for policy 1, policy_version 3840 (0.0016) [2023-09-30 11:59:47,745][118532] Updated weights for policy 0, policy_version 3840 (0.0017) [2023-09-30 11:59:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 1990656. Throughput: 0: 775.0, 1: 774.7. Samples: 497497. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 11:59:51,947][117662] Avg episode reward: [(0, '0.050'), (1, '0.110')] [2023-09-30 11:59:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2015232. Throughput: 0: 773.7, 1: 773.7. Samples: 501760. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 11:59:56,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.100')] [2023-09-30 12:00:01,003][118532] Updated weights for policy 0, policy_version 4000 (0.0019) [2023-09-30 12:00:01,003][118531] Updated weights for policy 1, policy_version 4000 (0.0020) [2023-09-30 12:00:01,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2048000. Throughput: 0: 772.7, 1: 772.8. Samples: 511113. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:00:01,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.120')] [2023-09-30 12:00:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2080768. Throughput: 0: 776.9, 1: 776.8. Samples: 520548. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:00:06,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.100')] [2023-09-30 12:00:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 2113536. Throughput: 0: 775.9, 1: 775.4. Samples: 525338. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:00:11,947][117662] Avg episode reward: [(0, '0.090'), (1, '0.130')] [2023-09-30 12:00:14,046][118532] Updated weights for policy 0, policy_version 4160 (0.0015) [2023-09-30 12:00:14,047][118531] Updated weights for policy 1, policy_version 4160 (0.0017) [2023-09-30 12:00:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2146304. Throughput: 0: 773.7, 1: 773.8. Samples: 534533. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:00:16,948][117662] Avg episode reward: [(0, '0.110'), (1, '0.170')] [2023-09-30 12:00:16,958][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000004192_1073152.pth... [2023-09-30 12:00:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000004192_1073152.pth... [2023-09-30 12:00:16,993][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000001280_327680.pth [2023-09-30 12:00:16,997][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000001280_327680.pth [2023-09-30 12:00:17,001][118438] Saving new best policy, reward=0.170! [2023-09-30 12:00:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2179072. Throughput: 0: 775.4, 1: 776.0. Samples: 544057. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:00:21,947][117662] Avg episode reward: [(0, '0.110'), (1, '0.180')] [2023-09-30 12:00:21,949][118438] Saving new best policy, reward=0.180! [2023-09-30 12:00:26,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2203648. Throughput: 0: 777.5, 1: 777.6. Samples: 548864. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:00:26,948][117662] Avg episode reward: [(0, '0.110'), (1, '0.180')] [2023-09-30 12:00:27,282][118532] Updated weights for policy 0, policy_version 4320 (0.0015) [2023-09-30 12:00:27,283][118531] Updated weights for policy 1, policy_version 4320 (0.0017) [2023-09-30 12:00:31,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2236416. Throughput: 0: 776.3, 1: 776.3. Samples: 557599. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:00:31,948][117662] Avg episode reward: [(0, '0.100'), (1, '0.160')] [2023-09-30 12:00:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 2269184. Throughput: 0: 769.9, 1: 769.5. Samples: 566767. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:00:36,947][117662] Avg episode reward: [(0, '0.060'), (1, '0.120')] [2023-09-30 12:00:40,697][118531] Updated weights for policy 1, policy_version 4480 (0.0017) [2023-09-30 12:00:40,697][118532] Updated weights for policy 0, policy_version 4480 (0.0016) [2023-09-30 12:00:41,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2293760. Throughput: 0: 773.7, 1: 773.7. Samples: 571392. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-30 12:00:41,948][117662] Avg episode reward: [(0, '0.060'), (1, '0.160')] [2023-09-30 12:00:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2326528. Throughput: 0: 774.9, 1: 774.6. Samples: 580841. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:00:46,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.170')] [2023-09-30 12:00:51,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2359296. Throughput: 0: 770.2, 1: 770.2. Samples: 589865. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:00:51,947][117662] Avg episode reward: [(0, '0.090'), (1, '0.140')] [2023-09-30 12:00:53,989][118532] Updated weights for policy 0, policy_version 4640 (0.0019) [2023-09-30 12:00:53,989][118531] Updated weights for policy 1, policy_version 4640 (0.0020) [2023-09-30 12:00:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2392064. Throughput: 0: 768.1, 1: 768.2. Samples: 594473. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:00:56,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.140')] [2023-09-30 12:01:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2424832. Throughput: 0: 771.9, 1: 772.4. Samples: 604027. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:01:01,947][117662] Avg episode reward: [(0, '0.090'), (1, '0.150')] [2023-09-30 12:01:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2449408. Throughput: 0: 771.3, 1: 770.8. Samples: 613453. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:01:06,948][117662] Avg episode reward: [(0, '0.090'), (1, '0.200')] [2023-09-30 12:01:07,018][118438] Saving new best policy, reward=0.200! [2023-09-30 12:01:07,020][118531] Updated weights for policy 1, policy_version 4800 (0.0017) [2023-09-30 12:01:07,021][118532] Updated weights for policy 0, policy_version 4800 (0.0017) [2023-09-30 12:01:11,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2482176. Throughput: 0: 771.9, 1: 772.9. Samples: 618377. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:01:11,948][117662] Avg episode reward: [(0, '0.060'), (1, '0.280')] [2023-09-30 12:01:11,950][118438] Saving new best policy, reward=0.280! [2023-09-30 12:01:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2514944. Throughput: 0: 776.5, 1: 776.0. Samples: 627463. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:01:16,948][117662] Avg episode reward: [(0, '0.080'), (1, '0.270')] [2023-09-30 12:01:20,308][118532] Updated weights for policy 0, policy_version 4960 (0.0017) [2023-09-30 12:01:20,308][118531] Updated weights for policy 1, policy_version 4960 (0.0017) [2023-09-30 12:01:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2547712. Throughput: 0: 775.8, 1: 778.1. Samples: 636692. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:01:21,948][117662] Avg episode reward: [(0, '0.080'), (1, '0.240')] [2023-09-30 12:01:26,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2580480. Throughput: 0: 773.8, 1: 773.8. Samples: 641033. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:01:26,948][117662] Avg episode reward: [(0, '0.090'), (1, '0.190')] [2023-09-30 12:01:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2605056. Throughput: 0: 773.3, 1: 772.8. Samples: 650416. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:01:31,949][117662] Avg episode reward: [(0, '0.070'), (1, '0.260')] [2023-09-30 12:01:33,641][118532] Updated weights for policy 0, policy_version 5120 (0.0018) [2023-09-30 12:01:33,641][118531] Updated weights for policy 1, policy_version 5120 (0.0015) [2023-09-30 12:01:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2637824. Throughput: 0: 775.5, 1: 775.7. Samples: 659671. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:01:36,948][117662] Avg episode reward: [(0, '0.070'), (1, '0.300')] [2023-09-30 12:01:36,949][118438] Saving new best policy, reward=0.300! [2023-09-30 12:01:41,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2670592. Throughput: 0: 779.4, 1: 779.0. Samples: 664600. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:01:41,948][117662] Avg episode reward: [(0, '0.030'), (1, '0.340')] [2023-09-30 12:01:41,950][118438] Saving new best policy, reward=0.340! [2023-09-30 12:01:46,613][118532] Updated weights for policy 0, policy_version 5280 (0.0017) [2023-09-30 12:01:46,614][118531] Updated weights for policy 1, policy_version 5280 (0.0018) [2023-09-30 12:01:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2703360. Throughput: 0: 775.6, 1: 775.0. Samples: 673801. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:01:46,948][117662] Avg episode reward: [(0, '0.040'), (1, '0.360')] [2023-09-30 12:01:46,957][118438] Saving new best policy, reward=0.360! [2023-09-30 12:01:51,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2736128. Throughput: 0: 777.3, 1: 777.1. Samples: 683402. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:01:51,948][117662] Avg episode reward: [(0, '0.030'), (1, '0.330')] [2023-09-30 12:01:56,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2760704. Throughput: 0: 775.5, 1: 774.5. Samples: 688128. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:01:56,948][117662] Avg episode reward: [(0, '0.050'), (1, '0.380')] [2023-09-30 12:01:57,111][118438] Saving new best policy, reward=0.380! [2023-09-30 12:01:59,796][118531] Updated weights for policy 1, policy_version 5440 (0.0016) [2023-09-30 12:01:59,796][118532] Updated weights for policy 0, policy_version 5440 (0.0016) [2023-09-30 12:02:01,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2793472. Throughput: 0: 774.0, 1: 776.0. Samples: 697215. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:02:01,947][117662] Avg episode reward: [(0, '0.060'), (1, '0.370')] [2023-09-30 12:02:06,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6192.6). Total num frames: 2826240. Throughput: 0: 777.1, 1: 775.5. Samples: 706560. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-30 12:02:06,947][117662] Avg episode reward: [(0, '0.060'), (1, '0.360')] [2023-09-30 12:02:11,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2859008. Throughput: 0: 778.9, 1: 779.3. Samples: 711152. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:02:11,948][117662] Avg episode reward: [(0, '0.060'), (1, '0.450')] [2023-09-30 12:02:11,950][118438] Saving new best policy, reward=0.450! [2023-09-30 12:02:13,029][118532] Updated weights for policy 0, policy_version 5600 (0.0017) [2023-09-30 12:02:13,029][118531] Updated weights for policy 1, policy_version 5600 (0.0018) [2023-09-30 12:02:16,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6212.3, 300 sec: 6206.5). Total num frames: 2887680. Throughput: 0: 780.1, 1: 781.0. Samples: 720665. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:02:16,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.440')] [2023-09-30 12:02:16,961][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000005648_1445888.pth... [2023-09-30 12:02:16,962][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000005648_1445888.pth... [2023-09-30 12:02:16,996][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000002736_700416.pth [2023-09-30 12:02:16,997][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000002736_700416.pth [2023-09-30 12:02:21,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2916352. Throughput: 0: 778.0, 1: 778.7. Samples: 729723. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:02:21,947][117662] Avg episode reward: [(0, '0.070'), (1, '0.440')] [2023-09-30 12:02:26,118][118532] Updated weights for policy 0, policy_version 5760 (0.0018) [2023-09-30 12:02:26,118][118531] Updated weights for policy 1, policy_version 5760 (0.0019) [2023-09-30 12:02:26,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2949120. Throughput: 0: 778.3, 1: 778.4. Samples: 734650. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:02:26,948][117662] Avg episode reward: [(0, '0.090'), (1, '0.460')] [2023-09-30 12:02:26,949][118438] Saving new best policy, reward=0.460! [2023-09-30 12:02:31,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6192.6). Total num frames: 2981888. Throughput: 0: 777.2, 1: 778.0. Samples: 743782. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:02:31,947][117662] Avg episode reward: [(0, '0.090'), (1, '0.440')] [2023-09-30 12:02:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3014656. Throughput: 0: 777.8, 1: 777.5. Samples: 753393. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:02:36,948][117662] Avg episode reward: [(0, '0.100'), (1, '0.480')] [2023-09-30 12:02:36,949][118438] Saving new best policy, reward=0.480! [2023-09-30 12:02:39,263][118532] Updated weights for policy 0, policy_version 5920 (0.0017) [2023-09-30 12:02:39,264][118531] Updated weights for policy 1, policy_version 5920 (0.0018) [2023-09-30 12:02:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3047424. Throughput: 0: 774.2, 1: 774.4. Samples: 757813. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:02:41,948][117662] Avg episode reward: [(0, '0.100'), (1, '0.550')] [2023-09-30 12:02:41,949][118438] Saving new best policy, reward=0.550! [2023-09-30 12:02:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3072000. Throughput: 0: 779.6, 1: 778.8. Samples: 767343. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:02:46,948][117662] Avg episode reward: [(0, '0.090'), (1, '0.580')] [2023-09-30 12:02:46,958][118438] Saving new best policy, reward=0.580! [2023-09-30 12:02:51,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3104768. Throughput: 0: 774.2, 1: 774.3. Samples: 776241. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:02:51,948][117662] Avg episode reward: [(0, '0.080'), (1, '0.620')] [2023-09-30 12:02:51,950][118438] Saving new best policy, reward=0.620! [2023-09-30 12:02:52,677][118532] Updated weights for policy 0, policy_version 6080 (0.0018) [2023-09-30 12:02:52,677][118531] Updated weights for policy 1, policy_version 6080 (0.0019) [2023-09-30 12:02:56,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6192.6). Total num frames: 3137536. Throughput: 0: 773.1, 1: 773.1. Samples: 780732. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:02:56,947][117662] Avg episode reward: [(0, '0.080'), (1, '0.670')] [2023-09-30 12:02:56,948][118438] Saving new best policy, reward=0.670! [2023-09-30 12:03:01,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3162112. Throughput: 0: 772.6, 1: 772.2. Samples: 790183. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:03:01,948][117662] Avg episode reward: [(0, '0.150'), (1, '0.680')] [2023-09-30 12:03:01,994][118438] Saving new best policy, reward=0.680! [2023-09-30 12:03:06,066][118532] Updated weights for policy 0, policy_version 6240 (0.0018) [2023-09-30 12:03:06,066][118531] Updated weights for policy 1, policy_version 6240 (0.0017) [2023-09-30 12:03:06,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3194880. Throughput: 0: 771.6, 1: 770.4. Samples: 799113. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:03:06,948][117662] Avg episode reward: [(0, '0.180'), (1, '0.730')] [2023-09-30 12:03:06,949][118358] Saving new best policy, reward=0.180! [2023-09-30 12:03:06,949][118438] Saving new best policy, reward=0.730! [2023-09-30 12:03:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3227648. Throughput: 0: 768.9, 1: 768.5. Samples: 803833. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:03:11,947][117662] Avg episode reward: [(0, '0.180'), (1, '0.710')] [2023-09-30 12:03:16,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6212.3, 300 sec: 6192.6). Total num frames: 3260416. Throughput: 0: 773.2, 1: 772.7. Samples: 813349. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:03:16,947][117662] Avg episode reward: [(0, '0.200'), (1, '0.660')] [2023-09-30 12:03:16,954][118358] Saving new best policy, reward=0.200! [2023-09-30 12:03:19,087][118531] Updated weights for policy 1, policy_version 6400 (0.0016) [2023-09-30 12:03:19,088][118532] Updated weights for policy 0, policy_version 6400 (0.0016) [2023-09-30 12:03:21,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 3293184. Throughput: 0: 773.9, 1: 774.4. Samples: 823069. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:03:21,948][117662] Avg episode reward: [(0, '0.310'), (1, '0.750')] [2023-09-30 12:03:21,948][118358] Saving new best policy, reward=0.310! [2023-09-30 12:03:21,949][118438] Saving new best policy, reward=0.750! [2023-09-30 12:03:26,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3325952. Throughput: 0: 773.3, 1: 773.2. Samples: 827403. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:03:26,948][117662] Avg episode reward: [(0, '0.360'), (1, '0.780')] [2023-09-30 12:03:26,949][118358] Saving new best policy, reward=0.360! [2023-09-30 12:03:26,949][118438] Saving new best policy, reward=0.780! [2023-09-30 12:03:31,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3350528. Throughput: 0: 773.9, 1: 772.2. Samples: 836917. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:03:31,947][117662] Avg episode reward: [(0, '0.380'), (1, '0.800')] [2023-09-30 12:03:31,955][118358] Saving new best policy, reward=0.380! [2023-09-30 12:03:31,955][118438] Saving new best policy, reward=0.800! [2023-09-30 12:03:32,268][118531] Updated weights for policy 1, policy_version 6560 (0.0017) [2023-09-30 12:03:32,268][118532] Updated weights for policy 0, policy_version 6560 (0.0018) [2023-09-30 12:03:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3383296. Throughput: 0: 776.1, 1: 776.7. Samples: 846116. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:03:36,948][117662] Avg episode reward: [(0, '0.490'), (1, '0.860')] [2023-09-30 12:03:36,949][118358] Saving new best policy, reward=0.490! [2023-09-30 12:03:36,949][118438] Saving new best policy, reward=0.860! [2023-09-30 12:03:41,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3416064. Throughput: 0: 780.6, 1: 780.9. Samples: 850996. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:03:41,948][117662] Avg episode reward: [(0, '0.550'), (1, '0.810')] [2023-09-30 12:03:41,950][118358] Saving new best policy, reward=0.550! [2023-09-30 12:03:45,418][118532] Updated weights for policy 0, policy_version 6720 (0.0018) [2023-09-30 12:03:45,418][118531] Updated weights for policy 1, policy_version 6720 (0.0018) [2023-09-30 12:03:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3448832. Throughput: 0: 777.7, 1: 777.4. Samples: 860160. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:03:46,947][117662] Avg episode reward: [(0, '0.570'), (1, '0.820')] [2023-09-30 12:03:46,955][118358] Saving new best policy, reward=0.570! [2023-09-30 12:03:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3481600. Throughput: 0: 785.6, 1: 786.4. Samples: 869853. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:03:51,948][117662] Avg episode reward: [(0, '0.640'), (1, '0.890')] [2023-09-30 12:03:51,949][118358] Saving new best policy, reward=0.640! [2023-09-30 12:03:51,949][118438] Saving new best policy, reward=0.890! [2023-09-30 12:03:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3506176. Throughput: 0: 785.1, 1: 785.2. Samples: 874496. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:03:56,948][117662] Avg episode reward: [(0, '0.640'), (1, '0.880')] [2023-09-30 12:03:58,275][118531] Updated weights for policy 1, policy_version 6880 (0.0019) [2023-09-30 12:03:58,275][118532] Updated weights for policy 0, policy_version 6880 (0.0019) [2023-09-30 12:04:01,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 3538944. Throughput: 0: 786.4, 1: 785.8. Samples: 884100. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:04:01,948][117662] Avg episode reward: [(0, '0.670'), (1, '0.980')] [2023-09-30 12:04:01,957][118358] Saving new best policy, reward=0.670! [2023-09-30 12:04:01,958][118438] Saving new best policy, reward=0.980! [2023-09-30 12:04:06,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3571712. Throughput: 0: 776.9, 1: 777.1. Samples: 892998. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:04:06,948][117662] Avg episode reward: [(0, '0.720'), (1, '0.960')] [2023-09-30 12:04:06,949][118358] Saving new best policy, reward=0.720! [2023-09-30 12:04:11,534][118532] Updated weights for policy 0, policy_version 7040 (0.0019) [2023-09-30 12:04:11,535][118531] Updated weights for policy 1, policy_version 7040 (0.0018) [2023-09-30 12:04:11,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3604480. Throughput: 0: 780.1, 1: 780.1. Samples: 897609. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:04:11,947][117662] Avg episode reward: [(0, '0.780'), (1, '0.950')] [2023-09-30 12:04:11,948][118358] Saving new best policy, reward=0.780! [2023-09-30 12:04:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3637248. Throughput: 0: 781.3, 1: 781.9. Samples: 907264. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-30 12:04:16,947][117662] Avg episode reward: [(0, '0.720'), (1, '1.000')] [2023-09-30 12:04:16,957][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000007104_1818624.pth... [2023-09-30 12:04:16,957][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000007104_1818624.pth... [2023-09-30 12:04:16,987][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000004192_1073152.pth [2023-09-30 12:04:16,990][118438] Saving new best policy, reward=1.000! [2023-09-30 12:04:16,992][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000004192_1073152.pth [2023-09-30 12:04:21,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3670016. Throughput: 0: 785.2, 1: 785.1. Samples: 916781. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-30 12:04:21,948][117662] Avg episode reward: [(0, '0.710'), (1, '1.060')] [2023-09-30 12:04:21,949][118438] Saving new best policy, reward=1.060! [2023-09-30 12:04:24,484][118531] Updated weights for policy 1, policy_version 7200 (0.0019) [2023-09-30 12:04:24,484][118532] Updated weights for policy 0, policy_version 7200 (0.0019) [2023-09-30 12:04:26,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3694592. Throughput: 0: 784.9, 1: 784.1. Samples: 921600. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:04:26,948][117662] Avg episode reward: [(0, '0.710'), (1, '1.040')] [2023-09-30 12:04:31,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3727360. Throughput: 0: 780.2, 1: 780.4. Samples: 930386. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:04:31,948][117662] Avg episode reward: [(0, '0.650'), (1, '1.060')] [2023-09-30 12:04:36,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3760128. Throughput: 0: 780.0, 1: 779.5. Samples: 940031. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:04:36,947][117662] Avg episode reward: [(0, '0.650'), (1, '1.100')] [2023-09-30 12:04:36,948][118438] Saving new best policy, reward=1.100! [2023-09-30 12:04:37,825][118531] Updated weights for policy 1, policy_version 7360 (0.0017) [2023-09-30 12:04:37,826][118532] Updated weights for policy 0, policy_version 7360 (0.0018) [2023-09-30 12:04:41,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 3792896. Throughput: 0: 777.1, 1: 777.9. Samples: 944471. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:04:41,947][117662] Avg episode reward: [(0, '0.690'), (1, '1.150')] [2023-09-30 12:04:41,948][118438] Saving new best policy, reward=1.150! [2023-09-30 12:04:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3817472. Throughput: 0: 774.1, 1: 775.8. Samples: 953846. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:04:46,948][117662] Avg episode reward: [(0, '0.760'), (1, '1.190')] [2023-09-30 12:04:46,960][118438] Saving new best policy, reward=1.190! [2023-09-30 12:04:51,167][118532] Updated weights for policy 0, policy_version 7520 (0.0017) [2023-09-30 12:04:51,169][118531] Updated weights for policy 1, policy_version 7520 (0.0016) [2023-09-30 12:04:51,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3850240. Throughput: 0: 775.6, 1: 775.4. Samples: 962789. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:04:51,948][117662] Avg episode reward: [(0, '0.850'), (1, '1.200')] [2023-09-30 12:04:51,949][118358] Saving new best policy, reward=0.850! [2023-09-30 12:04:51,949][118438] Saving new best policy, reward=1.200! [2023-09-30 12:04:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3883008. Throughput: 0: 776.9, 1: 777.6. Samples: 967561. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:04:56,948][117662] Avg episode reward: [(0, '0.970'), (1, '1.140')] [2023-09-30 12:04:56,949][118358] Saving new best policy, reward=0.970! [2023-09-30 12:05:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3915776. Throughput: 0: 773.7, 1: 773.7. Samples: 976898. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:05:01,948][117662] Avg episode reward: [(0, '1.030'), (1, '1.090')] [2023-09-30 12:05:01,958][118358] Saving new best policy, reward=1.030! [2023-09-30 12:05:04,509][118532] Updated weights for policy 0, policy_version 7680 (0.0015) [2023-09-30 12:05:04,509][118531] Updated weights for policy 1, policy_version 7680 (0.0017) [2023-09-30 12:05:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3940352. Throughput: 0: 768.8, 1: 767.7. Samples: 985924. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:05:06,948][117662] Avg episode reward: [(0, '1.090'), (1, '1.030')] [2023-09-30 12:05:06,949][118358] Saving new best policy, reward=1.090! [2023-09-30 12:05:11,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 3973120. Throughput: 0: 768.4, 1: 767.9. Samples: 990732. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:05:11,947][117662] Avg episode reward: [(0, '1.100'), (1, '1.090')] [2023-09-30 12:05:11,948][118358] Saving new best policy, reward=1.100! [2023-09-30 12:05:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4005888. Throughput: 0: 770.7, 1: 770.8. Samples: 999754. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:05:16,948][117662] Avg episode reward: [(0, '1.070'), (1, '1.130')] [2023-09-30 12:05:17,712][118531] Updated weights for policy 1, policy_version 7840 (0.0015) [2023-09-30 12:05:17,713][118532] Updated weights for policy 0, policy_version 7840 (0.0014) [2023-09-30 12:05:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4038656. Throughput: 0: 770.1, 1: 770.2. Samples: 1009342. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:05:21,947][117662] Avg episode reward: [(0, '1.000'), (1, '1.210')] [2023-09-30 12:05:21,948][118438] Saving new best policy, reward=1.210! [2023-09-30 12:05:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 4071424. Throughput: 0: 770.9, 1: 770.4. Samples: 1013828. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:05:26,947][117662] Avg episode reward: [(0, '0.920'), (1, '1.210')] [2023-09-30 12:05:30,967][118532] Updated weights for policy 0, policy_version 8000 (0.0016) [2023-09-30 12:05:30,969][118531] Updated weights for policy 1, policy_version 8000 (0.0015) [2023-09-30 12:05:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4096000. Throughput: 0: 771.6, 1: 772.2. Samples: 1023317. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:05:31,948][117662] Avg episode reward: [(0, '0.880'), (1, '1.250')] [2023-09-30 12:05:31,957][118438] Saving new best policy, reward=1.250! [2023-09-30 12:05:36,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4128768. Throughput: 0: 772.6, 1: 772.8. Samples: 1032333. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:05:36,948][117662] Avg episode reward: [(0, '0.900'), (1, '1.290')] [2023-09-30 12:05:36,949][118438] Saving new best policy, reward=1.290! [2023-09-30 12:05:41,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4161536. Throughput: 0: 773.1, 1: 771.9. Samples: 1037084. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:05:41,947][117662] Avg episode reward: [(0, '0.900'), (1, '1.330')] [2023-09-30 12:05:41,948][118438] Saving new best policy, reward=1.330! [2023-09-30 12:05:44,168][118532] Updated weights for policy 0, policy_version 8160 (0.0017) [2023-09-30 12:05:44,168][118531] Updated weights for policy 1, policy_version 8160 (0.0018) [2023-09-30 12:05:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4194304. Throughput: 0: 773.7, 1: 773.6. Samples: 1046528. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:05:46,948][117662] Avg episode reward: [(0, '0.980'), (1, '1.450')] [2023-09-30 12:05:46,958][118438] Saving new best policy, reward=1.450! [2023-09-30 12:05:51,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4218880. Throughput: 0: 773.1, 1: 774.0. Samples: 1055545. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:05:51,948][117662] Avg episode reward: [(0, '1.030'), (1, '1.650')] [2023-09-30 12:05:51,952][118438] Saving new best policy, reward=1.650! [2023-09-30 12:05:56,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4251648. Throughput: 0: 774.4, 1: 775.2. Samples: 1060462. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:05:56,947][117662] Avg episode reward: [(0, '1.040'), (1, '1.700')] [2023-09-30 12:05:56,948][118438] Saving new best policy, reward=1.700! [2023-09-30 12:05:57,395][118531] Updated weights for policy 1, policy_version 8320 (0.0017) [2023-09-30 12:05:57,395][118532] Updated weights for policy 0, policy_version 8320 (0.0018) [2023-09-30 12:06:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4284416. Throughput: 0: 776.0, 1: 776.4. Samples: 1069614. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:06:01,948][117662] Avg episode reward: [(0, '1.040'), (1, '1.700')] [2023-09-30 12:06:06,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4317184. Throughput: 0: 777.3, 1: 777.2. Samples: 1079295. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:06:06,948][117662] Avg episode reward: [(0, '1.040'), (1, '1.750')] [2023-09-30 12:06:06,949][118438] Saving new best policy, reward=1.750! [2023-09-30 12:06:10,363][118532] Updated weights for policy 0, policy_version 8480 (0.0017) [2023-09-30 12:06:10,365][118531] Updated weights for policy 1, policy_version 8480 (0.0017) [2023-09-30 12:06:11,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4349952. Throughput: 0: 777.4, 1: 777.2. Samples: 1083781. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:06:11,947][117662] Avg episode reward: [(0, '1.010'), (1, '1.760')] [2023-09-30 12:06:11,949][118438] Saving new best policy, reward=1.760! [2023-09-30 12:06:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4382720. Throughput: 0: 780.5, 1: 779.4. Samples: 1093516. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:16,948][117662] Avg episode reward: [(0, '1.020'), (1, '1.650')] [2023-09-30 12:06:16,956][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000008560_2191360.pth... [2023-09-30 12:06:16,957][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000008560_2191360.pth... [2023-09-30 12:06:16,989][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000005648_1445888.pth [2023-09-30 12:06:16,994][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000005648_1445888.pth [2023-09-30 12:06:21,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4407296. Throughput: 0: 779.8, 1: 779.8. Samples: 1102515. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:21,948][117662] Avg episode reward: [(0, '1.060'), (1, '1.620')] [2023-09-30 12:06:23,516][118532] Updated weights for policy 0, policy_version 8640 (0.0020) [2023-09-30 12:06:23,517][118531] Updated weights for policy 1, policy_version 8640 (0.0018) [2023-09-30 12:06:26,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4440064. Throughput: 0: 781.4, 1: 782.0. Samples: 1107439. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:26,948][117662] Avg episode reward: [(0, '1.060'), (1, '1.620')] [2023-09-30 12:06:31,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 4472832. Throughput: 0: 779.0, 1: 778.9. Samples: 1116635. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:31,947][117662] Avg episode reward: [(0, '1.110'), (1, '1.700')] [2023-09-30 12:06:31,955][118358] Saving new best policy, reward=1.110! [2023-09-30 12:06:36,817][118531] Updated weights for policy 1, policy_version 8800 (0.0016) [2023-09-30 12:06:36,818][118532] Updated weights for policy 0, policy_version 8800 (0.0016) [2023-09-30 12:06:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4505600. Throughput: 0: 783.9, 1: 781.2. Samples: 1125974. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:36,948][117662] Avg episode reward: [(0, '1.130'), (1, '1.680')] [2023-09-30 12:06:36,949][118358] Saving new best policy, reward=1.130! [2023-09-30 12:06:41,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4530176. Throughput: 0: 778.3, 1: 778.0. Samples: 1130496. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:41,948][117662] Avg episode reward: [(0, '1.170'), (1, '1.720')] [2023-09-30 12:06:42,042][118358] Saving new best policy, reward=1.170! [2023-09-30 12:06:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4562944. Throughput: 0: 781.4, 1: 781.9. Samples: 1139963. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:46,948][117662] Avg episode reward: [(0, '1.180'), (1, '1.840')] [2023-09-30 12:06:46,957][118358] Saving new best policy, reward=1.180! [2023-09-30 12:06:46,957][118438] Saving new best policy, reward=1.840! [2023-09-30 12:06:49,871][118531] Updated weights for policy 1, policy_version 8960 (0.0016) [2023-09-30 12:06:49,871][118532] Updated weights for policy 0, policy_version 8960 (0.0018) [2023-09-30 12:06:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4595712. Throughput: 0: 775.9, 1: 776.3. Samples: 1149145. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:51,948][117662] Avg episode reward: [(0, '1.300'), (1, '1.890')] [2023-09-30 12:06:51,949][118358] Saving new best policy, reward=1.300! [2023-09-30 12:06:51,949][118438] Saving new best policy, reward=1.890! [2023-09-30 12:06:56,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4628480. Throughput: 0: 777.4, 1: 777.7. Samples: 1153760. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:06:56,948][117662] Avg episode reward: [(0, '1.380'), (1, '1.870')] [2023-09-30 12:06:56,949][118358] Saving new best policy, reward=1.380! [2023-09-30 12:07:01,947][117662] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6206.5). Total num frames: 4657152. Throughput: 0: 774.6, 1: 774.2. Samples: 1163214. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:01,947][117662] Avg episode reward: [(0, '1.400'), (1, '1.790')] [2023-09-30 12:07:01,979][118358] Saving new best policy, reward=1.400! [2023-09-30 12:07:03,304][118532] Updated weights for policy 0, policy_version 9120 (0.0018) [2023-09-30 12:07:03,304][118531] Updated weights for policy 1, policy_version 9120 (0.0019) [2023-09-30 12:07:06,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4685824. Throughput: 0: 771.4, 1: 770.9. Samples: 1171917. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:07:06,947][117662] Avg episode reward: [(0, '1.440'), (1, '1.800')] [2023-09-30 12:07:06,948][118358] Saving new best policy, reward=1.440! [2023-09-30 12:07:11,947][117662] Fps is (10 sec: 6143.8, 60 sec: 6144.0, 300 sec: 6206.5). Total num frames: 4718592. Throughput: 0: 769.0, 1: 768.8. Samples: 1176643. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:11,948][117662] Avg episode reward: [(0, '1.490'), (1, '1.750')] [2023-09-30 12:07:11,949][118358] Saving new best policy, reward=1.490! [2023-09-30 12:07:16,622][118531] Updated weights for policy 1, policy_version 9280 (0.0018) [2023-09-30 12:07:16,622][118532] Updated weights for policy 0, policy_version 9280 (0.0015) [2023-09-30 12:07:16,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4751360. Throughput: 0: 768.4, 1: 768.6. Samples: 1185797. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:16,948][117662] Avg episode reward: [(0, '1.440'), (1, '1.780')] [2023-09-30 12:07:21,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6206.5). Total num frames: 4780032. Throughput: 0: 768.3, 1: 770.5. Samples: 1195221. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:07:21,948][117662] Avg episode reward: [(0, '1.390'), (1, '1.760')] [2023-09-30 12:07:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4808704. Throughput: 0: 773.6, 1: 773.7. Samples: 1200125. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:26,948][117662] Avg episode reward: [(0, '1.440'), (1, '1.920')] [2023-09-30 12:07:27,101][118438] Saving new best policy, reward=1.920! [2023-09-30 12:07:29,825][118532] Updated weights for policy 0, policy_version 9440 (0.0018) [2023-09-30 12:07:29,825][118531] Updated weights for policy 1, policy_version 9440 (0.0018) [2023-09-30 12:07:31,947][117662] Fps is (10 sec: 6144.1, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4841472. Throughput: 0: 769.4, 1: 769.1. Samples: 1209195. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:31,947][117662] Avg episode reward: [(0, '1.470'), (1, '2.050')] [2023-09-30 12:07:31,955][118438] Saving new best policy, reward=2.050! [2023-09-30 12:07:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4874240. Throughput: 0: 771.6, 1: 771.1. Samples: 1218565. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:07:36,948][117662] Avg episode reward: [(0, '1.470'), (1, '2.150')] [2023-09-30 12:07:36,949][118438] Saving new best policy, reward=2.150! [2023-09-30 12:07:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4907008. Throughput: 0: 772.7, 1: 772.4. Samples: 1223291. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:41,948][117662] Avg episode reward: [(0, '1.490'), (1, '2.220')] [2023-09-30 12:07:41,950][118438] Saving new best policy, reward=2.220! [2023-09-30 12:07:43,037][118531] Updated weights for policy 1, policy_version 9600 (0.0018) [2023-09-30 12:07:43,037][118532] Updated weights for policy 0, policy_version 9600 (0.0014) [2023-09-30 12:07:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4931584. Throughput: 0: 770.0, 1: 768.4. Samples: 1232438. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:46,948][117662] Avg episode reward: [(0, '1.580'), (1, '2.270')] [2023-09-30 12:07:47,091][118358] Saving new best policy, reward=1.580! [2023-09-30 12:07:47,095][118438] Saving new best policy, reward=2.270! [2023-09-30 12:07:51,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 4964352. Throughput: 0: 771.6, 1: 772.1. Samples: 1241382. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:51,947][117662] Avg episode reward: [(0, '1.520'), (1, '2.240')] [2023-09-30 12:07:56,344][118532] Updated weights for policy 0, policy_version 9760 (0.0017) [2023-09-30 12:07:56,344][118531] Updated weights for policy 1, policy_version 9760 (0.0015) [2023-09-30 12:07:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4997120. Throughput: 0: 773.5, 1: 772.2. Samples: 1246203. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:07:56,948][117662] Avg episode reward: [(0, '1.520'), (1, '2.270')] [2023-09-30 12:08:01,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6212.3, 300 sec: 6220.4). Total num frames: 5029888. Throughput: 0: 775.2, 1: 775.7. Samples: 1255584. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:08:01,947][117662] Avg episode reward: [(0, '1.540'), (1, '2.240')] [2023-09-30 12:08:06,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5062656. Throughput: 0: 777.8, 1: 777.9. Samples: 1265227. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:08:06,948][117662] Avg episode reward: [(0, '1.590'), (1, '2.270')] [2023-09-30 12:08:06,949][118358] Saving new best policy, reward=1.590! [2023-09-30 12:08:09,437][118532] Updated weights for policy 0, policy_version 9920 (0.0017) [2023-09-30 12:08:09,438][118531] Updated weights for policy 1, policy_version 9920 (0.0017) [2023-09-30 12:08:11,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 5087232. Throughput: 0: 773.8, 1: 773.7. Samples: 1269760. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:08:11,948][117662] Avg episode reward: [(0, '1.590'), (1, '2.260')] [2023-09-30 12:08:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 5120000. Throughput: 0: 778.8, 1: 777.4. Samples: 1279224. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:08:16,948][117662] Avg episode reward: [(0, '1.590'), (1, '2.280')] [2023-09-30 12:08:16,958][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000010000_2560000.pth... [2023-09-30 12:08:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000010000_2560000.pth... [2023-09-30 12:08:16,989][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000007104_1818624.pth [2023-09-30 12:08:16,993][118438] Saving new best policy, reward=2.280! [2023-09-30 12:08:16,993][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000007104_1818624.pth [2023-09-30 12:08:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6212.3, 300 sec: 6192.6). Total num frames: 5152768. Throughput: 0: 773.8, 1: 774.0. Samples: 1288215. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:08:21,948][117662] Avg episode reward: [(0, '1.620'), (1, '2.260')] [2023-09-30 12:08:21,949][118358] Saving new best policy, reward=1.620! [2023-09-30 12:08:22,606][118531] Updated weights for policy 1, policy_version 10080 (0.0017) [2023-09-30 12:08:22,606][118532] Updated weights for policy 0, policy_version 10080 (0.0018) [2023-09-30 12:08:26,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5185536. Throughput: 0: 775.5, 1: 775.8. Samples: 1293101. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:08:26,948][117662] Avg episode reward: [(0, '1.670'), (1, '2.350')] [2023-09-30 12:08:26,949][118358] Saving new best policy, reward=1.670! [2023-09-30 12:08:26,949][118438] Saving new best policy, reward=2.350! [2023-09-30 12:08:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5218304. Throughput: 0: 778.4, 1: 779.5. Samples: 1302544. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:08:31,947][117662] Avg episode reward: [(0, '1.700'), (1, '2.380')] [2023-09-30 12:08:31,955][118358] Saving new best policy, reward=1.700! [2023-09-30 12:08:31,955][118438] Saving new best policy, reward=2.380! [2023-09-30 12:08:35,478][118532] Updated weights for policy 0, policy_version 10240 (0.0017) [2023-09-30 12:08:35,478][118531] Updated weights for policy 1, policy_version 10240 (0.0016) [2023-09-30 12:08:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5251072. Throughput: 0: 788.4, 1: 788.2. Samples: 1312328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:08:36,948][117662] Avg episode reward: [(0, '1.730'), (1, '2.340')] [2023-09-30 12:08:36,949][118358] Saving new best policy, reward=1.730! [2023-09-30 12:08:41,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5283840. Throughput: 0: 784.7, 1: 786.4. Samples: 1316902. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:08:41,948][117662] Avg episode reward: [(0, '1.740'), (1, '2.460')] [2023-09-30 12:08:41,949][118358] Saving new best policy, reward=1.740! [2023-09-30 12:08:41,950][118438] Saving new best policy, reward=2.460! [2023-09-30 12:08:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 5308416. Throughput: 0: 789.5, 1: 789.6. Samples: 1326642. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:08:46,948][117662] Avg episode reward: [(0, '1.780'), (1, '2.460')] [2023-09-30 12:08:47,042][118358] Saving new best policy, reward=1.780! [2023-09-30 12:08:48,320][118532] Updated weights for policy 0, policy_version 10400 (0.0017) [2023-09-30 12:08:48,320][118531] Updated weights for policy 1, policy_version 10400 (0.0017) [2023-09-30 12:08:51,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5341184. Throughput: 0: 785.5, 1: 784.9. Samples: 1335893. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:08:51,948][117662] Avg episode reward: [(0, '1.790'), (1, '2.510')] [2023-09-30 12:08:51,950][118358] Saving new best policy, reward=1.790! [2023-09-30 12:08:51,950][118438] Saving new best policy, reward=2.510! [2023-09-30 12:08:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5373952. Throughput: 0: 787.5, 1: 788.1. Samples: 1340659. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:08:56,948][117662] Avg episode reward: [(0, '1.930'), (1, '2.530')] [2023-09-30 12:08:56,949][118358] Saving new best policy, reward=1.930! [2023-09-30 12:08:56,949][118438] Saving new best policy, reward=2.530! [2023-09-30 12:09:01,527][118531] Updated weights for policy 1, policy_version 10560 (0.0018) [2023-09-30 12:09:01,528][118532] Updated weights for policy 0, policy_version 10560 (0.0019) [2023-09-30 12:09:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5406720. Throughput: 0: 782.6, 1: 783.5. Samples: 1349698. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:01,948][117662] Avg episode reward: [(0, '1.900'), (1, '2.470')] [2023-09-30 12:09:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5439488. Throughput: 0: 791.6, 1: 792.7. Samples: 1359508. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:06,948][117662] Avg episode reward: [(0, '1.880'), (1, '2.460')] [2023-09-30 12:09:11,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6192.6). Total num frames: 5464064. Throughput: 0: 787.6, 1: 787.2. Samples: 1363968. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:11,947][117662] Avg episode reward: [(0, '1.880'), (1, '2.480')] [2023-09-30 12:09:14,627][118532] Updated weights for policy 0, policy_version 10720 (0.0017) [2023-09-30 12:09:14,627][118531] Updated weights for policy 1, policy_version 10720 (0.0016) [2023-09-30 12:09:16,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6192.6). Total num frames: 5496832. Throughput: 0: 788.2, 1: 788.4. Samples: 1373492. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:16,947][117662] Avg episode reward: [(0, '1.970'), (1, '2.480')] [2023-09-30 12:09:16,957][118358] Saving new best policy, reward=1.970! [2023-09-30 12:09:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 5529600. Throughput: 0: 781.8, 1: 781.8. Samples: 1382692. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:09:21,947][117662] Avg episode reward: [(0, '1.960'), (1, '2.480')] [2023-09-30 12:09:26,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5562368. Throughput: 0: 786.9, 1: 786.3. Samples: 1387696. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:26,948][117662] Avg episode reward: [(0, '1.960'), (1, '2.460')] [2023-09-30 12:09:27,571][118531] Updated weights for policy 1, policy_version 10880 (0.0019) [2023-09-30 12:09:27,572][118532] Updated weights for policy 0, policy_version 10880 (0.0019) [2023-09-30 12:09:31,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5595136. Throughput: 0: 781.6, 1: 781.1. Samples: 1396961. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:31,947][117662] Avg episode reward: [(0, '1.880'), (1, '2.560')] [2023-09-30 12:09:31,955][118438] Saving new best policy, reward=2.560! [2023-09-30 12:09:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5627904. Throughput: 0: 785.4, 1: 787.1. Samples: 1406655. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:36,947][117662] Avg episode reward: [(0, '1.860'), (1, '2.570')] [2023-09-30 12:09:36,948][118438] Saving new best policy, reward=2.570! [2023-09-30 12:09:40,695][118532] Updated weights for policy 0, policy_version 11040 (0.0017) [2023-09-30 12:09:40,695][118531] Updated weights for policy 1, policy_version 11040 (0.0017) [2023-09-30 12:09:41,947][117662] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6234.3). Total num frames: 5656576. Throughput: 0: 782.7, 1: 782.1. Samples: 1411072. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:41,947][117662] Avg episode reward: [(0, '1.810'), (1, '2.610')] [2023-09-30 12:09:41,948][118438] Saving new best policy, reward=2.610! [2023-09-30 12:09:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5685248. Throughput: 0: 789.0, 1: 789.7. Samples: 1420738. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:46,948][117662] Avg episode reward: [(0, '1.800'), (1, '2.570')] [2023-09-30 12:09:51,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 5718016. Throughput: 0: 782.5, 1: 781.7. Samples: 1429898. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:51,948][117662] Avg episode reward: [(0, '1.760'), (1, '2.580')] [2023-09-30 12:09:53,750][118532] Updated weights for policy 0, policy_version 11200 (0.0016) [2023-09-30 12:09:53,750][118531] Updated weights for policy 1, policy_version 11200 (0.0018) [2023-09-30 12:09:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5750784. Throughput: 0: 785.0, 1: 785.7. Samples: 1434651. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:09:56,948][117662] Avg episode reward: [(0, '1.730'), (1, '2.610')] [2023-09-30 12:10:01,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 5783552. Throughput: 0: 782.9, 1: 782.7. Samples: 1443945. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:10:01,947][117662] Avg episode reward: [(0, '1.730'), (1, '2.500')] [2023-09-30 12:10:06,763][118532] Updated weights for policy 0, policy_version 11360 (0.0016) [2023-09-30 12:10:06,763][118531] Updated weights for policy 1, policy_version 11360 (0.0018) [2023-09-30 12:10:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5816320. Throughput: 0: 788.3, 1: 788.5. Samples: 1453651. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:10:06,948][117662] Avg episode reward: [(0, '1.750'), (1, '2.570')] [2023-09-30 12:10:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6248.1). Total num frames: 5849088. Throughput: 0: 782.9, 1: 783.3. Samples: 1458176. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:10:11,948][117662] Avg episode reward: [(0, '1.800'), (1, '2.540')] [2023-09-30 12:10:16,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5873664. Throughput: 0: 788.6, 1: 787.2. Samples: 1467871. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:10:16,948][117662] Avg episode reward: [(0, '1.850'), (1, '2.520')] [2023-09-30 12:10:17,117][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000011488_2940928.pth... [2023-09-30 12:10:17,145][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000008560_2191360.pth [2023-09-30 12:10:17,147][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000011488_2940928.pth... [2023-09-30 12:10:17,175][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000008560_2191360.pth [2023-09-30 12:10:19,840][118532] Updated weights for policy 0, policy_version 11520 (0.0017) [2023-09-30 12:10:19,841][118531] Updated weights for policy 1, policy_version 11520 (0.0017) [2023-09-30 12:10:21,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 5906432. Throughput: 0: 779.1, 1: 778.2. Samples: 1476735. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:10:21,947][117662] Avg episode reward: [(0, '1.860'), (1, '2.540')] [2023-09-30 12:10:26,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 5939200. Throughput: 0: 783.7, 1: 782.8. Samples: 1481568. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:10:26,947][117662] Avg episode reward: [(0, '1.970'), (1, '2.560')] [2023-09-30 12:10:31,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5971968. Throughput: 0: 780.6, 1: 779.6. Samples: 1490944. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:10:31,948][117662] Avg episode reward: [(0, '2.010'), (1, '2.610')] [2023-09-30 12:10:31,955][118358] Saving new best policy, reward=2.010! [2023-09-30 12:10:33,020][118532] Updated weights for policy 0, policy_version 11680 (0.0016) [2023-09-30 12:10:33,020][118531] Updated weights for policy 1, policy_version 11680 (0.0016) [2023-09-30 12:10:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6004736. Throughput: 0: 784.9, 1: 785.0. Samples: 1500543. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:10:36,947][117662] Avg episode reward: [(0, '2.000'), (1, '2.640')] [2023-09-30 12:10:36,948][118438] Saving new best policy, reward=2.640! [2023-09-30 12:10:41,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6212.2, 300 sec: 6220.4). Total num frames: 6029312. Throughput: 0: 785.1, 1: 784.4. Samples: 1505280. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:10:41,948][117662] Avg episode reward: [(0, '2.040'), (1, '2.640')] [2023-09-30 12:10:42,048][118358] Saving new best policy, reward=2.040! [2023-09-30 12:10:45,938][118532] Updated weights for policy 0, policy_version 11840 (0.0018) [2023-09-30 12:10:45,938][118531] Updated weights for policy 1, policy_version 11840 (0.0018) [2023-09-30 12:10:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 6062080. Throughput: 0: 785.7, 1: 785.9. Samples: 1514666. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:10:46,947][117662] Avg episode reward: [(0, '2.150'), (1, '2.650')] [2023-09-30 12:10:46,957][118358] Saving new best policy, reward=2.150! [2023-09-30 12:10:46,957][118438] Saving new best policy, reward=2.650! [2023-09-30 12:10:51,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6094848. Throughput: 0: 778.7, 1: 778.2. Samples: 1523712. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:10:51,947][117662] Avg episode reward: [(0, '2.090'), (1, '2.600')] [2023-09-30 12:10:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 6127616. Throughput: 0: 779.8, 1: 780.4. Samples: 1528383. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:10:56,947][117662] Avg episode reward: [(0, '2.120'), (1, '2.620')] [2023-09-30 12:10:59,230][118532] Updated weights for policy 0, policy_version 12000 (0.0018) [2023-09-30 12:10:59,230][118531] Updated weights for policy 1, policy_version 12000 (0.0018) [2023-09-30 12:11:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6160384. Throughput: 0: 779.0, 1: 779.7. Samples: 1538016. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:11:01,947][117662] Avg episode reward: [(0, '2.110'), (1, '2.610')] [2023-09-30 12:11:06,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 6184960. Throughput: 0: 784.0, 1: 784.1. Samples: 1547298. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:11:06,948][117662] Avg episode reward: [(0, '2.190'), (1, '2.660')] [2023-09-30 12:11:07,042][118438] Saving new best policy, reward=2.660! [2023-09-30 12:11:07,046][118358] Saving new best policy, reward=2.190! [2023-09-30 12:11:11,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 6217728. Throughput: 0: 784.4, 1: 784.4. Samples: 1552166. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:11:11,947][117662] Avg episode reward: [(0, '2.220'), (1, '2.700')] [2023-09-30 12:11:11,948][118358] Saving new best policy, reward=2.220! [2023-09-30 12:11:11,948][118438] Saving new best policy, reward=2.700! [2023-09-30 12:11:12,314][118531] Updated weights for policy 1, policy_version 12160 (0.0018) [2023-09-30 12:11:12,315][118532] Updated weights for policy 0, policy_version 12160 (0.0018) [2023-09-30 12:11:16,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 6250496. Throughput: 0: 783.1, 1: 782.1. Samples: 1561379. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:11:16,947][117662] Avg episode reward: [(0, '2.240'), (1, '2.750')] [2023-09-30 12:11:16,954][118358] Saving new best policy, reward=2.240! [2023-09-30 12:11:16,955][118438] Saving new best policy, reward=2.750! [2023-09-30 12:11:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6283264. Throughput: 0: 780.1, 1: 779.8. Samples: 1570739. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:11:21,947][117662] Avg episode reward: [(0, '2.230'), (1, '2.790')] [2023-09-30 12:11:21,948][118438] Saving new best policy, reward=2.790! [2023-09-30 12:11:25,581][118532] Updated weights for policy 0, policy_version 12320 (0.0017) [2023-09-30 12:11:25,581][118531] Updated weights for policy 1, policy_version 12320 (0.0016) [2023-09-30 12:11:26,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6316032. Throughput: 0: 774.2, 1: 774.5. Samples: 1574971. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:11:26,948][117662] Avg episode reward: [(0, '2.310'), (1, '2.750')] [2023-09-30 12:11:26,949][118358] Saving new best policy, reward=2.310! [2023-09-30 12:11:31,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 6340608. Throughput: 0: 776.9, 1: 777.2. Samples: 1584599. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:11:31,947][117662] Avg episode reward: [(0, '2.530'), (1, '2.720')] [2023-09-30 12:11:32,125][118358] Saving new best policy, reward=2.530! [2023-09-30 12:11:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6373376. Throughput: 0: 779.9, 1: 780.2. Samples: 1593917. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:11:36,948][117662] Avg episode reward: [(0, '2.590'), (1, '2.700')] [2023-09-30 12:11:36,949][118358] Saving new best policy, reward=2.590! [2023-09-30 12:11:38,628][118531] Updated weights for policy 1, policy_version 12480 (0.0017) [2023-09-30 12:11:38,629][118532] Updated weights for policy 0, policy_version 12480 (0.0017) [2023-09-30 12:11:41,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 6406144. Throughput: 0: 781.4, 1: 780.8. Samples: 1598681. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:11:41,947][117662] Avg episode reward: [(0, '2.590'), (1, '2.730')] [2023-09-30 12:11:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6438912. Throughput: 0: 774.2, 1: 775.0. Samples: 1607732. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:11:46,948][117662] Avg episode reward: [(0, '2.730'), (1, '2.730')] [2023-09-30 12:11:46,961][118358] Saving new best policy, reward=2.730! [2023-09-30 12:11:51,722][118531] Updated weights for policy 1, policy_version 12640 (0.0016) [2023-09-30 12:11:51,722][118532] Updated weights for policy 0, policy_version 12640 (0.0017) [2023-09-30 12:11:51,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6471680. Throughput: 0: 780.2, 1: 780.4. Samples: 1617527. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:11:51,948][117662] Avg episode reward: [(0, '2.750'), (1, '2.720')] [2023-09-30 12:11:51,949][118358] Saving new best policy, reward=2.750! [2023-09-30 12:11:56,948][117662] Fps is (10 sec: 6553.1, 60 sec: 6280.4, 300 sec: 6262.0). Total num frames: 6504448. Throughput: 0: 775.6, 1: 776.6. Samples: 1622016. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:11:56,949][117662] Avg episode reward: [(0, '2.630'), (1, '2.770')] [2023-09-30 12:12:01,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6529024. Throughput: 0: 778.5, 1: 779.6. Samples: 1631495. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:12:01,948][117662] Avg episode reward: [(0, '2.770'), (1, '2.760')] [2023-09-30 12:12:01,957][118358] Saving new best policy, reward=2.770! [2023-09-30 12:12:05,020][118532] Updated weights for policy 0, policy_version 12800 (0.0017) [2023-09-30 12:12:05,020][118531] Updated weights for policy 1, policy_version 12800 (0.0017) [2023-09-30 12:12:06,947][117662] Fps is (10 sec: 5734.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6561792. Throughput: 0: 774.9, 1: 775.0. Samples: 1640488. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:12:06,948][117662] Avg episode reward: [(0, '2.850'), (1, '2.740')] [2023-09-30 12:12:06,949][118358] Saving new best policy, reward=2.850! [2023-09-30 12:12:11,947][117662] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6594560. Throughput: 0: 780.5, 1: 780.0. Samples: 1645191. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:12:11,947][117662] Avg episode reward: [(0, '2.830'), (1, '2.740')] [2023-09-30 12:12:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6262.0). Total num frames: 6627328. Throughput: 0: 780.1, 1: 779.6. Samples: 1654784. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:12:16,948][117662] Avg episode reward: [(0, '2.660'), (1, '2.720')] [2023-09-30 12:12:16,959][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000012944_3313664.pth... [2023-09-30 12:12:16,959][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000012944_3313664.pth... [2023-09-30 12:12:16,988][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000010000_2560000.pth [2023-09-30 12:12:16,998][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000010000_2560000.pth [2023-09-30 12:12:18,015][118532] Updated weights for policy 0, policy_version 12960 (0.0015) [2023-09-30 12:12:18,016][118531] Updated weights for policy 1, policy_version 12960 (0.0018) [2023-09-30 12:12:21,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6660096. Throughput: 0: 782.3, 1: 782.6. Samples: 1664335. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:12:21,948][117662] Avg episode reward: [(0, '2.760'), (1, '2.760')] [2023-09-30 12:12:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6684672. Throughput: 0: 782.6, 1: 782.7. Samples: 1669120. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:12:26,948][117662] Avg episode reward: [(0, '2.660'), (1, '2.810')] [2023-09-30 12:12:27,099][118438] Saving new best policy, reward=2.810! [2023-09-30 12:12:31,058][118532] Updated weights for policy 0, policy_version 13120 (0.0018) [2023-09-30 12:12:31,058][118531] Updated weights for policy 1, policy_version 13120 (0.0018) [2023-09-30 12:12:31,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6717440. Throughput: 0: 784.1, 1: 784.8. Samples: 1678336. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:12:31,947][117662] Avg episode reward: [(0, '2.760'), (1, '2.840')] [2023-09-30 12:12:31,955][118438] Saving new best policy, reward=2.840! [2023-09-30 12:12:36,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 6750208. Throughput: 0: 778.3, 1: 777.8. Samples: 1687552. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:12:36,947][117662] Avg episode reward: [(0, '2.860'), (1, '2.850')] [2023-09-30 12:12:36,948][118358] Saving new best policy, reward=2.860! [2023-09-30 12:12:36,948][118438] Saving new best policy, reward=2.850! [2023-09-30 12:12:41,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6782976. Throughput: 0: 777.3, 1: 777.3. Samples: 1691968. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:12:41,948][117662] Avg episode reward: [(0, '2.930'), (1, '2.850')] [2023-09-30 12:12:41,949][118358] Saving new best policy, reward=2.930! [2023-09-30 12:12:44,390][118532] Updated weights for policy 0, policy_version 13280 (0.0017) [2023-09-30 12:12:44,390][118531] Updated weights for policy 1, policy_version 13280 (0.0014) [2023-09-30 12:12:46,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 6811648. Throughput: 0: 779.3, 1: 779.2. Samples: 1701629. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:12:46,948][117662] Avg episode reward: [(0, '3.000'), (1, '2.870')] [2023-09-30 12:12:46,958][118358] Saving new best policy, reward=3.000! [2023-09-30 12:12:46,959][118438] Saving new best policy, reward=2.870! [2023-09-30 12:12:51,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6840320. Throughput: 0: 783.6, 1: 782.5. Samples: 1710961. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:12:51,948][117662] Avg episode reward: [(0, '2.990'), (1, '2.860')] [2023-09-30 12:12:56,947][117662] Fps is (10 sec: 6144.1, 60 sec: 6144.1, 300 sec: 6248.1). Total num frames: 6873088. Throughput: 0: 787.2, 1: 787.8. Samples: 1716065. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:12:56,947][117662] Avg episode reward: [(0, '3.170'), (1, '2.860')] [2023-09-30 12:12:56,948][118358] Saving new best policy, reward=3.170! [2023-09-30 12:12:57,195][118531] Updated weights for policy 1, policy_version 13440 (0.0018) [2023-09-30 12:12:57,195][118532] Updated weights for policy 0, policy_version 13440 (0.0017) [2023-09-30 12:13:01,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 6905856. Throughput: 0: 786.0, 1: 786.4. Samples: 1725538. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:13:01,947][117662] Avg episode reward: [(0, '3.350'), (1, '2.860')] [2023-09-30 12:13:01,955][118358] Saving new best policy, reward=3.350! [2023-09-30 12:13:06,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6938624. Throughput: 0: 781.8, 1: 781.2. Samples: 1734670. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:13:06,948][117662] Avg episode reward: [(0, '3.380'), (1, '2.910')] [2023-09-30 12:13:06,949][118358] Saving new best policy, reward=3.380! [2023-09-30 12:13:06,950][118438] Saving new best policy, reward=2.910! [2023-09-30 12:13:10,204][118531] Updated weights for policy 1, policy_version 13600 (0.0018) [2023-09-30 12:13:10,204][118532] Updated weights for policy 0, policy_version 13600 (0.0017) [2023-09-30 12:13:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6971392. Throughput: 0: 781.8, 1: 782.4. Samples: 1739508. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:13:11,947][117662] Avg episode reward: [(0, '3.420'), (1, '2.900')] [2023-09-30 12:13:11,948][118358] Saving new best policy, reward=3.420! [2023-09-30 12:13:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7004160. Throughput: 0: 785.4, 1: 784.6. Samples: 1748988. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:13:16,948][117662] Avg episode reward: [(0, '3.560'), (1, '2.860')] [2023-09-30 12:13:16,958][118358] Saving new best policy, reward=3.560! [2023-09-30 12:13:21,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7028736. Throughput: 0: 779.1, 1: 779.2. Samples: 1757674. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:13:21,947][117662] Avg episode reward: [(0, '3.860'), (1, '2.800')] [2023-09-30 12:13:21,948][118358] Saving new best policy, reward=3.860! [2023-09-30 12:13:23,689][118532] Updated weights for policy 0, policy_version 13760 (0.0015) [2023-09-30 12:13:23,691][118531] Updated weights for policy 1, policy_version 13760 (0.0018) [2023-09-30 12:13:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7061504. Throughput: 0: 782.6, 1: 783.3. Samples: 1762431. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:13:26,948][117662] Avg episode reward: [(0, '3.850'), (1, '2.720')] [2023-09-30 12:13:31,947][117662] Fps is (10 sec: 6553.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7094272. Throughput: 0: 777.8, 1: 778.2. Samples: 1771652. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:13:31,948][117662] Avg episode reward: [(0, '3.920'), (1, '2.700')] [2023-09-30 12:13:31,958][118358] Saving new best policy, reward=3.920! [2023-09-30 12:13:36,752][118531] Updated weights for policy 1, policy_version 13920 (0.0018) [2023-09-30 12:13:36,753][118532] Updated weights for policy 0, policy_version 13920 (0.0017) [2023-09-30 12:13:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7127040. Throughput: 0: 781.7, 1: 783.0. Samples: 1781374. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:13:36,948][117662] Avg episode reward: [(0, '3.900'), (1, '2.690')] [2023-09-30 12:13:41,950][117662] Fps is (10 sec: 6552.0, 60 sec: 6280.3, 300 sec: 6275.8). Total num frames: 7159808. Throughput: 0: 775.6, 1: 775.2. Samples: 1785856. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:13:41,951][117662] Avg episode reward: [(0, '4.050'), (1, '2.670')] [2023-09-30 12:13:41,951][118358] Saving new best policy, reward=4.050! [2023-09-30 12:13:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6248.1). Total num frames: 7184384. Throughput: 0: 777.4, 1: 776.6. Samples: 1795467. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:13:46,948][117662] Avg episode reward: [(0, '4.310'), (1, '2.700')] [2023-09-30 12:13:47,141][118358] Saving new best policy, reward=4.310! [2023-09-30 12:13:49,769][118532] Updated weights for policy 0, policy_version 14080 (0.0015) [2023-09-30 12:13:49,769][118531] Updated weights for policy 1, policy_version 14080 (0.0017) [2023-09-30 12:13:51,947][117662] Fps is (10 sec: 5735.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7217152. Throughput: 0: 778.2, 1: 778.4. Samples: 1804717. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:13:51,948][117662] Avg episode reward: [(0, '4.190'), (1, '2.740')] [2023-09-30 12:13:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7249920. Throughput: 0: 780.3, 1: 779.5. Samples: 1809700. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:13:56,948][117662] Avg episode reward: [(0, '4.240'), (1, '2.820')] [2023-09-30 12:14:01,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7282688. Throughput: 0: 777.2, 1: 777.3. Samples: 1818941. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:14:01,947][117662] Avg episode reward: [(0, '4.210'), (1, '2.790')] [2023-09-30 12:14:02,875][118532] Updated weights for policy 0, policy_version 14240 (0.0018) [2023-09-30 12:14:02,875][118531] Updated weights for policy 1, policy_version 14240 (0.0014) [2023-09-30 12:14:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7315456. Throughput: 0: 786.0, 1: 786.0. Samples: 1828416. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:14:06,948][117662] Avg episode reward: [(0, '4.420'), (1, '2.800')] [2023-09-30 12:14:06,949][118358] Saving new best policy, reward=4.420! [2023-09-30 12:14:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7348224. Throughput: 0: 784.0, 1: 783.4. Samples: 1832966. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:14:11,947][117662] Avg episode reward: [(0, '4.260'), (1, '2.840')] [2023-09-30 12:14:15,729][118532] Updated weights for policy 0, policy_version 14400 (0.0017) [2023-09-30 12:14:15,730][118531] Updated weights for policy 1, policy_version 14400 (0.0017) [2023-09-30 12:14:16,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7372800. Throughput: 0: 790.9, 1: 791.4. Samples: 1842856. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:14:16,948][117662] Avg episode reward: [(0, '4.110'), (1, '2.840')] [2023-09-30 12:14:17,033][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000014416_3690496.pth... [2023-09-30 12:14:17,038][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000014416_3690496.pth... [2023-09-30 12:14:17,066][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000011488_2940928.pth [2023-09-30 12:14:17,077][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000011488_2940928.pth [2023-09-30 12:14:21,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7405568. Throughput: 0: 785.1, 1: 785.0. Samples: 1852027. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:14:21,947][117662] Avg episode reward: [(0, '4.530'), (1, '2.840')] [2023-09-30 12:14:21,948][118358] Saving new best policy, reward=4.530! [2023-09-30 12:14:26,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7438336. Throughput: 0: 785.1, 1: 786.4. Samples: 1856566. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:14:26,947][117662] Avg episode reward: [(0, '4.780'), (1, '2.850')] [2023-09-30 12:14:26,948][118358] Saving new best policy, reward=4.780! [2023-09-30 12:14:29,132][118531] Updated weights for policy 1, policy_version 14560 (0.0018) [2023-09-30 12:14:29,133][118532] Updated weights for policy 0, policy_version 14560 (0.0017) [2023-09-30 12:14:31,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7471104. Throughput: 0: 780.5, 1: 780.9. Samples: 1865728. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:14:31,947][117662] Avg episode reward: [(0, '5.040'), (1, '2.790')] [2023-09-30 12:14:31,957][118358] Saving new best policy, reward=5.040! [2023-09-30 12:14:36,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6234.2). Total num frames: 7495680. Throughput: 0: 780.3, 1: 780.6. Samples: 1874956. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:14:36,948][117662] Avg episode reward: [(0, '5.320'), (1, '2.770')] [2023-09-30 12:14:37,034][118358] Saving new best policy, reward=5.320! [2023-09-30 12:14:41,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.3, 300 sec: 6248.1). Total num frames: 7528448. Throughput: 0: 778.0, 1: 779.3. Samples: 1879778. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:14:41,947][117662] Avg episode reward: [(0, '5.060'), (1, '2.710')] [2023-09-30 12:14:42,265][118532] Updated weights for policy 0, policy_version 14720 (0.0018) [2023-09-30 12:14:42,265][118531] Updated weights for policy 1, policy_version 14720 (0.0018) [2023-09-30 12:14:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7561216. Throughput: 0: 777.9, 1: 778.1. Samples: 1888960. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:14:46,948][117662] Avg episode reward: [(0, '5.340'), (1, '2.730')] [2023-09-30 12:14:46,959][118358] Saving new best policy, reward=5.340! [2023-09-30 12:14:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7593984. Throughput: 0: 778.7, 1: 778.6. Samples: 1898496. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:14:51,947][117662] Avg episode reward: [(0, '5.530'), (1, '2.640')] [2023-09-30 12:14:51,948][118358] Saving new best policy, reward=5.530! [2023-09-30 12:14:55,326][118531] Updated weights for policy 1, policy_version 14880 (0.0017) [2023-09-30 12:14:55,326][118532] Updated weights for policy 0, policy_version 14880 (0.0017) [2023-09-30 12:14:56,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7626752. Throughput: 0: 779.2, 1: 779.3. Samples: 1903098. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:14:56,947][117662] Avg episode reward: [(0, '5.560'), (1, '2.650')] [2023-09-30 12:14:56,948][118358] Saving new best policy, reward=5.560! [2023-09-30 12:15:01,947][117662] Fps is (10 sec: 6553.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7659520. Throughput: 0: 775.9, 1: 775.7. Samples: 1912680. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:15:01,948][117662] Avg episode reward: [(0, '6.060'), (1, '2.630')] [2023-09-30 12:15:01,961][118358] Saving new best policy, reward=6.060! [2023-09-30 12:15:06,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 7684096. Throughput: 0: 778.1, 1: 778.2. Samples: 1922060. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:15:06,948][117662] Avg episode reward: [(0, '6.240'), (1, '2.650')] [2023-09-30 12:15:07,040][118358] Saving new best policy, reward=6.240! [2023-09-30 12:15:08,371][118531] Updated weights for policy 1, policy_version 15040 (0.0018) [2023-09-30 12:15:08,371][118532] Updated weights for policy 0, policy_version 15040 (0.0018) [2023-09-30 12:15:11,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7716864. Throughput: 0: 781.9, 1: 780.8. Samples: 1926889. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:15:11,947][117662] Avg episode reward: [(0, '6.510'), (1, '2.650')] [2023-09-30 12:15:11,948][118358] Saving new best policy, reward=6.510! [2023-09-30 12:15:16,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7749632. Throughput: 0: 781.4, 1: 781.7. Samples: 1936067. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:16,948][117662] Avg episode reward: [(0, '6.510'), (1, '2.680')] [2023-09-30 12:15:21,552][118532] Updated weights for policy 0, policy_version 15200 (0.0017) [2023-09-30 12:15:21,553][118531] Updated weights for policy 1, policy_version 15200 (0.0017) [2023-09-30 12:15:21,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7782400. Throughput: 0: 785.2, 1: 784.3. Samples: 1945586. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:21,948][117662] Avg episode reward: [(0, '6.720'), (1, '2.690')] [2023-09-30 12:15:21,949][118358] Saving new best policy, reward=6.720! [2023-09-30 12:15:26,947][117662] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7815168. Throughput: 0: 780.9, 1: 780.0. Samples: 1950019. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:26,947][117662] Avg episode reward: [(0, '7.080'), (1, '2.730')] [2023-09-30 12:15:26,948][118358] Saving new best policy, reward=7.080! [2023-09-30 12:15:31,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 7839744. Throughput: 0: 785.6, 1: 783.3. Samples: 1959562. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:31,949][117662] Avg episode reward: [(0, '7.050'), (1, '2.760')] [2023-09-30 12:15:34,644][118531] Updated weights for policy 1, policy_version 15360 (0.0016) [2023-09-30 12:15:34,645][118532] Updated weights for policy 0, policy_version 15360 (0.0018) [2023-09-30 12:15:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7872512. Throughput: 0: 780.6, 1: 781.2. Samples: 1968777. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:36,947][117662] Avg episode reward: [(0, '6.870'), (1, '2.810')] [2023-09-30 12:15:41,947][117662] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7905280. Throughput: 0: 783.2, 1: 783.7. Samples: 1973608. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:41,947][117662] Avg episode reward: [(0, '7.060'), (1, '2.800')] [2023-09-30 12:15:46,947][117662] Fps is (10 sec: 6553.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7938048. Throughput: 0: 778.4, 1: 778.5. Samples: 1982741. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:46,948][117662] Avg episode reward: [(0, '7.050'), (1, '2.860')] [2023-09-30 12:15:47,783][118532] Updated weights for policy 0, policy_version 15520 (0.0019) [2023-09-30 12:15:47,783][118531] Updated weights for policy 1, policy_version 15520 (0.0018) [2023-09-30 12:15:51,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7970816. Throughput: 0: 782.9, 1: 782.5. Samples: 1992505. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:51,948][117662] Avg episode reward: [(0, '7.240'), (1, '2.890')] [2023-09-30 12:15:51,949][118358] Saving new best policy, reward=7.240! [2023-09-30 12:15:56,947][117662] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8003584. Throughput: 0: 778.1, 1: 778.1. Samples: 1996916. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:15:56,947][117662] Avg episode reward: [(0, '7.260'), (1, '2.910')] [2023-09-30 12:15:56,948][118358] Saving new best policy, reward=7.260! [2023-09-30 12:16:00,960][118532] Updated weights for policy 0, policy_version 15680 (0.0018) [2023-09-30 12:16:00,960][118531] Updated weights for policy 1, policy_version 15680 (0.0018) [2023-09-30 12:16:01,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8028160. Throughput: 0: 780.0, 1: 782.9. Samples: 2006398. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:01,949][117662] Avg episode reward: [(0, '7.370'), (1, '2.900')] [2023-09-30 12:16:01,958][118358] Saving new best policy, reward=7.370! [2023-09-30 12:16:06,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8060928. Throughput: 0: 776.3, 1: 777.0. Samples: 2015482. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:06,948][117662] Avg episode reward: [(0, '7.480'), (1, '2.910')] [2023-09-30 12:16:06,949][118358] Saving new best policy, reward=7.480! [2023-09-30 12:16:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8093696. Throughput: 0: 781.7, 1: 782.0. Samples: 2020389. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:11,948][117662] Avg episode reward: [(0, '7.630'), (1, '2.850')] [2023-09-30 12:16:11,950][118358] Saving new best policy, reward=7.630! [2023-09-30 12:16:13,917][118531] Updated weights for policy 1, policy_version 15840 (0.0018) [2023-09-30 12:16:13,917][118532] Updated weights for policy 0, policy_version 15840 (0.0017) [2023-09-30 12:16:16,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 8126464. Throughput: 0: 778.9, 1: 781.0. Samples: 2029754. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:16:16,947][117662] Avg episode reward: [(0, '7.560'), (1, '2.880')] [2023-09-30 12:16:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000015872_4063232.pth... [2023-09-30 12:16:16,959][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000015872_4063232.pth... [2023-09-30 12:16:16,995][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000012944_3313664.pth [2023-09-30 12:16:16,996][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000012944_3313664.pth [2023-09-30 12:16:21,947][117662] Fps is (10 sec: 6553.9, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 8159232. Throughput: 0: 787.4, 1: 785.6. Samples: 2039560. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:16:21,947][117662] Avg episode reward: [(0, '7.620'), (1, '2.870')] [2023-09-30 12:16:26,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6212.2, 300 sec: 6262.0). Total num frames: 8187904. Throughput: 0: 781.4, 1: 780.8. Samples: 2043906. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:26,948][117662] Avg episode reward: [(0, '7.430'), (1, '2.860')] [2023-09-30 12:16:27,012][118532] Updated weights for policy 0, policy_version 16000 (0.0015) [2023-09-30 12:16:27,012][118531] Updated weights for policy 1, policy_version 16000 (0.0016) [2023-09-30 12:16:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 8216576. Throughput: 0: 784.1, 1: 785.1. Samples: 2053354. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:31,948][117662] Avg episode reward: [(0, '7.660'), (1, '2.800')] [2023-09-30 12:16:31,957][118358] Saving new best policy, reward=7.660! [2023-09-30 12:16:36,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8249344. Throughput: 0: 779.3, 1: 779.5. Samples: 2062650. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:36,948][117662] Avg episode reward: [(0, '7.620'), (1, '2.780')] [2023-09-30 12:16:40,079][118532] Updated weights for policy 0, policy_version 16160 (0.0018) [2023-09-30 12:16:40,079][118531] Updated weights for policy 1, policy_version 16160 (0.0018) [2023-09-30 12:16:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8282112. Throughput: 0: 784.4, 1: 784.0. Samples: 2067496. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:41,948][117662] Avg episode reward: [(0, '7.440'), (1, '2.810')] [2023-09-30 12:16:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8314880. Throughput: 0: 782.6, 1: 779.3. Samples: 2076680. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:16:46,948][117662] Avg episode reward: [(0, '7.520'), (1, '2.810')] [2023-09-30 12:16:51,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.2). Total num frames: 8347648. Throughput: 0: 785.3, 1: 785.2. Samples: 2086156. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:16:51,947][117662] Avg episode reward: [(0, '7.860'), (1, '2.740')] [2023-09-30 12:16:51,948][118358] Saving new best policy, reward=7.860! [2023-09-30 12:16:53,232][118531] Updated weights for policy 1, policy_version 16320 (0.0018) [2023-09-30 12:16:53,232][118532] Updated weights for policy 0, policy_version 16320 (0.0018) [2023-09-30 12:16:56,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8372224. Throughput: 0: 784.4, 1: 784.4. Samples: 2090986. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:16:56,947][117662] Avg episode reward: [(0, '7.950'), (1, '2.780')] [2023-09-30 12:16:56,948][118358] Saving new best policy, reward=7.950! [2023-09-30 12:17:01,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8404992. Throughput: 0: 782.5, 1: 782.7. Samples: 2100185. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:17:01,948][117662] Avg episode reward: [(0, '7.730'), (1, '2.750')] [2023-09-30 12:17:06,306][118532] Updated weights for policy 0, policy_version 16480 (0.0018) [2023-09-30 12:17:06,306][118531] Updated weights for policy 1, policy_version 16480 (0.0016) [2023-09-30 12:17:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8437760. Throughput: 0: 775.9, 1: 777.2. Samples: 2109450. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:06,948][117662] Avg episode reward: [(0, '7.840'), (1, '2.780')] [2023-09-30 12:17:11,947][117662] Fps is (10 sec: 6553.9, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 8470528. Throughput: 0: 779.7, 1: 779.8. Samples: 2114081. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:11,947][117662] Avg episode reward: [(0, '7.850'), (1, '2.770')] [2023-09-30 12:17:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8503296. Throughput: 0: 783.1, 1: 781.5. Samples: 2123760. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:17:16,948][117662] Avg episode reward: [(0, '7.830'), (1, '2.760')] [2023-09-30 12:17:19,592][118531] Updated weights for policy 1, policy_version 16640 (0.0018) [2023-09-30 12:17:19,592][118532] Updated weights for policy 0, policy_version 16640 (0.0018) [2023-09-30 12:17:21,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8527872. Throughput: 0: 777.9, 1: 778.3. Samples: 2132678. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:17:21,948][117662] Avg episode reward: [(0, '7.750'), (1, '2.770')] [2023-09-30 12:17:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6248.1). Total num frames: 8560640. Throughput: 0: 773.2, 1: 771.9. Samples: 2137024. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:26,948][117662] Avg episode reward: [(0, '7.790'), (1, '2.740')] [2023-09-30 12:17:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8593408. Throughput: 0: 773.6, 1: 773.6. Samples: 2146305. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:31,948][117662] Avg episode reward: [(0, '7.970'), (1, '2.720')] [2023-09-30 12:17:31,955][118358] Saving new best policy, reward=7.970! [2023-09-30 12:17:32,973][118532] Updated weights for policy 0, policy_version 16800 (0.0019) [2023-09-30 12:17:32,973][118531] Updated weights for policy 1, policy_version 16800 (0.0018) [2023-09-30 12:17:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8626176. Throughput: 0: 775.2, 1: 775.2. Samples: 2155920. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:36,947][117662] Avg episode reward: [(0, '7.670'), (1, '2.680')] [2023-09-30 12:17:41,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6262.0). Total num frames: 8658944. Throughput: 0: 774.2, 1: 773.7. Samples: 2160642. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:41,947][117662] Avg episode reward: [(0, '7.850'), (1, '2.690')] [2023-09-30 12:17:45,853][118532] Updated weights for policy 0, policy_version 16960 (0.0018) [2023-09-30 12:17:45,853][118531] Updated weights for policy 1, policy_version 16960 (0.0017) [2023-09-30 12:17:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8683520. Throughput: 0: 777.8, 1: 778.1. Samples: 2170203. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:46,948][117662] Avg episode reward: [(0, '7.920'), (1, '2.690')] [2023-09-30 12:17:51,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8716288. Throughput: 0: 773.9, 1: 774.1. Samples: 2179112. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:51,948][117662] Avg episode reward: [(0, '8.160'), (1, '2.700')] [2023-09-30 12:17:51,950][118358] Saving new best policy, reward=8.160! [2023-09-30 12:17:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8749056. Throughput: 0: 776.6, 1: 776.8. Samples: 2183980. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:17:56,948][117662] Avg episode reward: [(0, '8.420'), (1, '2.760')] [2023-09-30 12:17:56,949][118358] Saving new best policy, reward=8.420! [2023-09-30 12:17:58,995][118532] Updated weights for policy 0, policy_version 17120 (0.0017) [2023-09-30 12:17:58,996][118531] Updated weights for policy 1, policy_version 17120 (0.0019) [2023-09-30 12:18:01,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 8781824. Throughput: 0: 773.9, 1: 773.8. Samples: 2193408. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:01,947][117662] Avg episode reward: [(0, '8.540'), (1, '2.820')] [2023-09-30 12:18:01,958][118358] Saving new best policy, reward=8.540! [2023-09-30 12:18:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 8806400. Throughput: 0: 777.8, 1: 778.4. Samples: 2202709. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:18:06,948][117662] Avg episode reward: [(0, '8.370'), (1, '2.830')] [2023-09-30 12:18:11,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 8839168. Throughput: 0: 781.1, 1: 780.9. Samples: 2207312. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:18:11,948][117662] Avg episode reward: [(0, '8.410'), (1, '2.820')] [2023-09-30 12:18:12,421][118532] Updated weights for policy 0, policy_version 17280 (0.0018) [2023-09-30 12:18:12,421][118531] Updated weights for policy 1, policy_version 17280 (0.0017) [2023-09-30 12:18:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8871936. Throughput: 0: 780.4, 1: 780.7. Samples: 2216552. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:18:16,947][117662] Avg episode reward: [(0, '8.480'), (1, '2.790')] [2023-09-30 12:18:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000017328_4435968.pth... [2023-09-30 12:18:16,958][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000017328_4435968.pth... [2023-09-30 12:18:16,994][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000014416_3690496.pth [2023-09-30 12:18:16,995][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000014416_3690496.pth [2023-09-30 12:18:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8904704. Throughput: 0: 780.7, 1: 780.5. Samples: 2226176. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:21,948][117662] Avg episode reward: [(0, '8.680'), (1, '2.770')] [2023-09-30 12:18:21,949][118358] Saving new best policy, reward=8.680! [2023-09-30 12:18:25,403][118531] Updated weights for policy 1, policy_version 17440 (0.0018) [2023-09-30 12:18:25,404][118532] Updated weights for policy 0, policy_version 17440 (0.0018) [2023-09-30 12:18:26,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8937472. Throughput: 0: 778.5, 1: 778.5. Samples: 2230704. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:26,948][117662] Avg episode reward: [(0, '8.850'), (1, '2.750')] [2023-09-30 12:18:26,949][118358] Saving new best policy, reward=8.850! [2023-09-30 12:18:31,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 8962048. Throughput: 0: 775.4, 1: 773.3. Samples: 2239896. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:31,947][117662] Avg episode reward: [(0, '9.270'), (1, '2.690')] [2023-09-30 12:18:32,144][118358] Saving new best policy, reward=9.270! [2023-09-30 12:18:36,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 8994816. Throughput: 0: 776.7, 1: 776.4. Samples: 2249001. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:36,947][117662] Avg episode reward: [(0, '9.590'), (1, '2.650')] [2023-09-30 12:18:36,948][118358] Saving new best policy, reward=9.590! [2023-09-30 12:18:38,784][118531] Updated weights for policy 1, policy_version 17600 (0.0015) [2023-09-30 12:18:38,785][118532] Updated weights for policy 0, policy_version 17600 (0.0018) [2023-09-30 12:18:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 9027584. Throughput: 0: 774.6, 1: 775.2. Samples: 2253719. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:41,947][117662] Avg episode reward: [(0, '9.560'), (1, '2.690')] [2023-09-30 12:18:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 9060352. Throughput: 0: 773.7, 1: 773.7. Samples: 2263040. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:46,947][117662] Avg episode reward: [(0, '9.610'), (1, '2.680')] [2023-09-30 12:18:46,956][118358] Saving new best policy, reward=9.610! [2023-09-30 12:18:51,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9084928. Throughput: 0: 774.5, 1: 773.9. Samples: 2272389. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:51,947][117662] Avg episode reward: [(0, '9.580'), (1, '2.700')] [2023-09-30 12:18:52,061][118532] Updated weights for policy 0, policy_version 17760 (0.0017) [2023-09-30 12:18:52,062][118531] Updated weights for policy 1, policy_version 17760 (0.0017) [2023-09-30 12:18:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9117696. Throughput: 0: 774.0, 1: 776.3. Samples: 2277078. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:18:56,948][117662] Avg episode reward: [(0, '9.810'), (1, '2.740')] [2023-09-30 12:18:56,949][118358] Saving new best policy, reward=9.810! [2023-09-30 12:19:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9150464. Throughput: 0: 770.5, 1: 770.5. Samples: 2285900. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:19:01,947][117662] Avg episode reward: [(0, '9.860'), (1, '2.830')] [2023-09-30 12:19:01,955][118358] Saving new best policy, reward=9.860! [2023-09-30 12:19:05,265][118531] Updated weights for policy 1, policy_version 17920 (0.0019) [2023-09-30 12:19:05,266][118532] Updated weights for policy 0, policy_version 17920 (0.0020) [2023-09-30 12:19:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 9183232. Throughput: 0: 772.3, 1: 773.6. Samples: 2295742. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:19:06,948][117662] Avg episode reward: [(0, '10.050'), (1, '2.810')] [2023-09-30 12:19:06,949][118358] Saving new best policy, reward=10.050! [2023-09-30 12:19:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 9216000. Throughput: 0: 770.3, 1: 770.7. Samples: 2300050. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:19:11,947][117662] Avg episode reward: [(0, '10.240'), (1, '2.760')] [2023-09-30 12:19:11,948][118358] Saving new best policy, reward=10.240! [2023-09-30 12:19:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9240576. Throughput: 0: 774.7, 1: 777.7. Samples: 2309757. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:19:16,948][117662] Avg episode reward: [(0, '10.330'), (1, '2.740')] [2023-09-30 12:19:17,048][118358] Saving new best policy, reward=10.330! [2023-09-30 12:19:18,359][118531] Updated weights for policy 1, policy_version 18080 (0.0019) [2023-09-30 12:19:18,359][118532] Updated weights for policy 0, policy_version 18080 (0.0019) [2023-09-30 12:19:21,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9273344. Throughput: 0: 777.8, 1: 777.9. Samples: 2319008. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:19:21,948][117662] Avg episode reward: [(0, '10.320'), (1, '2.740')] [2023-09-30 12:19:26,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9306112. Throughput: 0: 778.7, 1: 778.6. Samples: 2323795. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:19:26,948][117662] Avg episode reward: [(0, '10.470'), (1, '2.690')] [2023-09-30 12:19:26,949][118358] Saving new best policy, reward=10.470! [2023-09-30 12:19:31,317][118531] Updated weights for policy 1, policy_version 18240 (0.0018) [2023-09-30 12:19:31,318][118532] Updated weights for policy 0, policy_version 18240 (0.0018) [2023-09-30 12:19:31,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9338880. Throughput: 0: 779.8, 1: 780.0. Samples: 2333233. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:19:31,947][117662] Avg episode reward: [(0, '10.320'), (1, '2.650')] [2023-09-30 12:19:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9371648. Throughput: 0: 782.8, 1: 783.4. Samples: 2342867. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:19:36,948][117662] Avg episode reward: [(0, '9.960'), (1, '2.680')] [2023-09-30 12:19:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9404416. Throughput: 0: 777.4, 1: 776.8. Samples: 2347016. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:19:41,947][117662] Avg episode reward: [(0, '9.980'), (1, '2.730')] [2023-09-30 12:19:44,568][118532] Updated weights for policy 0, policy_version 18400 (0.0018) [2023-09-30 12:19:44,568][118531] Updated weights for policy 1, policy_version 18400 (0.0019) [2023-09-30 12:19:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9428992. Throughput: 0: 786.2, 1: 786.1. Samples: 2356654. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:19:46,948][117662] Avg episode reward: [(0, '10.150'), (1, '2.750')] [2023-09-30 12:19:51,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 9461760. Throughput: 0: 778.6, 1: 777.6. Samples: 2365774. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:19:51,947][117662] Avg episode reward: [(0, '10.100'), (1, '2.730')] [2023-09-30 12:19:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 9494528. Throughput: 0: 785.5, 1: 785.3. Samples: 2370735. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:19:56,948][117662] Avg episode reward: [(0, '10.340'), (1, '2.720')] [2023-09-30 12:19:57,658][118532] Updated weights for policy 0, policy_version 18560 (0.0016) [2023-09-30 12:19:57,659][118531] Updated weights for policy 1, policy_version 18560 (0.0017) [2023-09-30 12:20:01,947][117662] Fps is (10 sec: 6553.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9527296. Throughput: 0: 779.3, 1: 778.0. Samples: 2379835. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:20:01,948][117662] Avg episode reward: [(0, '9.650'), (1, '2.790')] [2023-09-30 12:20:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9560064. Throughput: 0: 784.9, 1: 785.8. Samples: 2389686. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:06,948][117662] Avg episode reward: [(0, '9.770'), (1, '2.820')] [2023-09-30 12:20:10,631][118531] Updated weights for policy 1, policy_version 18720 (0.0018) [2023-09-30 12:20:10,631][118532] Updated weights for policy 0, policy_version 18720 (0.0017) [2023-09-30 12:20:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9592832. Throughput: 0: 781.7, 1: 780.9. Samples: 2394112. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:11,948][117662] Avg episode reward: [(0, '9.790'), (1, '2.810')] [2023-09-30 12:20:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 9617408. Throughput: 0: 782.5, 1: 784.0. Samples: 2403727. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:16,948][117662] Avg episode reward: [(0, '9.680'), (1, '2.880')] [2023-09-30 12:20:17,145][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000018800_4812800.pth... [2023-09-30 12:20:17,153][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000018800_4812800.pth... [2023-09-30 12:20:17,173][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000015872_4063232.pth [2023-09-30 12:20:17,182][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000015872_4063232.pth [2023-09-30 12:20:21,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 9650176. Throughput: 0: 775.5, 1: 775.0. Samples: 2412642. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:20:21,947][117662] Avg episode reward: [(0, '9.960'), (1, '2.890')] [2023-09-30 12:20:23,889][118531] Updated weights for policy 1, policy_version 18880 (0.0016) [2023-09-30 12:20:23,889][118532] Updated weights for policy 0, policy_version 18880 (0.0016) [2023-09-30 12:20:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9682944. Throughput: 0: 782.5, 1: 782.7. Samples: 2417452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:20:26,948][117662] Avg episode reward: [(0, '10.210'), (1, '2.900')] [2023-09-30 12:20:31,947][117662] Fps is (10 sec: 6553.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9715712. Throughput: 0: 780.4, 1: 780.2. Samples: 2426880. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:31,948][117662] Avg episode reward: [(0, '10.430'), (1, '2.890')] [2023-09-30 12:20:36,947][117662] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 6234.3). Total num frames: 9744384. Throughput: 0: 783.8, 1: 783.9. Samples: 2436324. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:36,947][117662] Avg episode reward: [(0, '10.300'), (1, '2.900')] [2023-09-30 12:20:36,991][118532] Updated weights for policy 0, policy_version 19040 (0.0017) [2023-09-30 12:20:36,991][118531] Updated weights for policy 1, policy_version 19040 (0.0016) [2023-09-30 12:20:41,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9773056. Throughput: 0: 782.6, 1: 782.2. Samples: 2441154. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:41,948][117662] Avg episode reward: [(0, '10.160'), (1, '2.890')] [2023-09-30 12:20:46,947][117662] Fps is (10 sec: 6143.8, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 9805824. Throughput: 0: 787.0, 1: 787.0. Samples: 2450662. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:46,948][117662] Avg episode reward: [(0, '10.080'), (1, '2.880')] [2023-09-30 12:20:49,844][118532] Updated weights for policy 0, policy_version 19200 (0.0016) [2023-09-30 12:20:49,844][118531] Updated weights for policy 1, policy_version 19200 (0.0018) [2023-09-30 12:20:51,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 9838592. Throughput: 0: 779.8, 1: 779.0. Samples: 2459833. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:51,948][117662] Avg episode reward: [(0, '10.060'), (1, '2.890')] [2023-09-30 12:20:56,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 9871360. Throughput: 0: 782.5, 1: 782.5. Samples: 2464536. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:20:56,947][117662] Avg episode reward: [(0, '10.240'), (1, '2.930')] [2023-09-30 12:20:56,948][118438] Saving new best policy, reward=2.930! [2023-09-30 12:21:01,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 9904128. Throughput: 0: 781.4, 1: 779.8. Samples: 2473977. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:01,947][117662] Avg episode reward: [(0, '10.390'), (1, '2.910')] [2023-09-30 12:21:03,176][118531] Updated weights for policy 1, policy_version 19360 (0.0019) [2023-09-30 12:21:03,177][118532] Updated weights for policy 0, policy_version 19360 (0.0021) [2023-09-30 12:21:06,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9928704. Throughput: 0: 784.6, 1: 784.4. Samples: 2483250. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:06,948][117662] Avg episode reward: [(0, '10.380'), (1, '2.850')] [2023-09-30 12:21:11,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 9961472. Throughput: 0: 785.0, 1: 785.3. Samples: 2488115. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:11,947][117662] Avg episode reward: [(0, '10.270'), (1, '2.790')] [2023-09-30 12:21:16,167][118532] Updated weights for policy 0, policy_version 19520 (0.0017) [2023-09-30 12:21:16,167][118531] Updated weights for policy 1, policy_version 19520 (0.0018) [2023-09-30 12:21:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 9994240. Throughput: 0: 781.9, 1: 782.0. Samples: 2497256. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:16,948][117662] Avg episode reward: [(0, '10.410'), (1, '2.820')] [2023-09-30 12:21:21,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6234.3). Total num frames: 10027008. Throughput: 0: 780.7, 1: 782.3. Samples: 2506660. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:21,948][117662] Avg episode reward: [(0, '10.360'), (1, '2.820')] [2023-09-30 12:21:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10051584. Throughput: 0: 774.3, 1: 774.4. Samples: 2510848. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:26,948][117662] Avg episode reward: [(0, '10.430'), (1, '2.800')] [2023-09-30 12:21:29,670][118532] Updated weights for policy 0, policy_version 19680 (0.0017) [2023-09-30 12:21:29,671][118531] Updated weights for policy 1, policy_version 19680 (0.0016) [2023-09-30 12:21:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10084352. Throughput: 0: 773.4, 1: 773.9. Samples: 2520290. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:31,948][117662] Avg episode reward: [(0, '10.700'), (1, '2.840')] [2023-09-30 12:21:31,957][118358] Saving new best policy, reward=10.700! [2023-09-30 12:21:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6212.2, 300 sec: 6220.4). Total num frames: 10117120. Throughput: 0: 771.7, 1: 771.6. Samples: 2529280. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:36,948][117662] Avg episode reward: [(0, '10.690'), (1, '2.900')] [2023-09-30 12:21:41,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10149888. Throughput: 0: 769.8, 1: 770.5. Samples: 2533850. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:21:41,948][117662] Avg episode reward: [(0, '10.940'), (1, '2.880')] [2023-09-30 12:21:41,949][118358] Saving new best policy, reward=10.940! [2023-09-30 12:21:43,079][118531] Updated weights for policy 1, policy_version 19840 (0.0018) [2023-09-30 12:21:43,079][118532] Updated weights for policy 0, policy_version 19840 (0.0016) [2023-09-30 12:21:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 10174464. Throughput: 0: 772.0, 1: 770.8. Samples: 2543403. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:21:46,948][117662] Avg episode reward: [(0, '11.130'), (1, '2.780')] [2023-09-30 12:21:46,986][118358] Saving new best policy, reward=11.130! [2023-09-30 12:21:51,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10207232. Throughput: 0: 769.3, 1: 769.2. Samples: 2552486. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:21:51,947][117662] Avg episode reward: [(0, '11.740'), (1, '2.770')] [2023-09-30 12:21:51,948][118358] Saving new best policy, reward=11.740! [2023-09-30 12:21:56,262][118532] Updated weights for policy 0, policy_version 20000 (0.0018) [2023-09-30 12:21:56,263][118531] Updated weights for policy 1, policy_version 20000 (0.0018) [2023-09-30 12:21:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10240000. Throughput: 0: 770.4, 1: 768.0. Samples: 2557343. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:21:56,948][117662] Avg episode reward: [(0, '12.060'), (1, '2.710')] [2023-09-30 12:21:56,949][118358] Saving new best policy, reward=12.060! [2023-09-30 12:22:01,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10272768. Throughput: 0: 765.5, 1: 765.4. Samples: 2566145. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:22:01,948][117662] Avg episode reward: [(0, '11.980'), (1, '2.650')] [2023-09-30 12:22:06,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6206.5). Total num frames: 10301440. Throughput: 0: 766.2, 1: 767.2. Samples: 2575666. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:22:06,948][117662] Avg episode reward: [(0, '11.760'), (1, '2.630')] [2023-09-30 12:22:09,678][118532] Updated weights for policy 0, policy_version 20160 (0.0017) [2023-09-30 12:22:09,678][118531] Updated weights for policy 1, policy_version 20160 (0.0016) [2023-09-30 12:22:11,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 10330112. Throughput: 0: 770.4, 1: 771.4. Samples: 2580228. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:22:11,947][117662] Avg episode reward: [(0, '11.690'), (1, '2.730')] [2023-09-30 12:22:16,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10362880. Throughput: 0: 767.2, 1: 767.1. Samples: 2589335. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:22:16,948][117662] Avg episode reward: [(0, '11.410'), (1, '2.700')] [2023-09-30 12:22:16,960][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000020240_5181440.pth... [2023-09-30 12:22:16,960][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000020240_5181440.pth... [2023-09-30 12:22:16,995][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000017328_4435968.pth [2023-09-30 12:22:16,996][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000017328_4435968.pth [2023-09-30 12:22:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10395648. Throughput: 0: 772.1, 1: 773.0. Samples: 2598812. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:22:21,947][117662] Avg episode reward: [(0, '11.340'), (1, '2.660')] [2023-09-30 12:22:22,966][118532] Updated weights for policy 0, policy_version 20320 (0.0017) [2023-09-30 12:22:22,967][118531] Updated weights for policy 1, policy_version 20320 (0.0016) [2023-09-30 12:22:26,947][117662] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6206.5). Total num frames: 10424320. Throughput: 0: 768.7, 1: 768.2. Samples: 2603013. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:22:26,948][117662] Avg episode reward: [(0, '11.350'), (1, '2.650')] [2023-09-30 12:22:31,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 10452992. Throughput: 0: 766.9, 1: 768.4. Samples: 2612491. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:22:31,947][117662] Avg episode reward: [(0, '11.310'), (1, '2.610')] [2023-09-30 12:22:36,159][118532] Updated weights for policy 0, policy_version 20480 (0.0017) [2023-09-30 12:22:36,159][118531] Updated weights for policy 1, policy_version 20480 (0.0017) [2023-09-30 12:22:36,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 10485760. Throughput: 0: 769.0, 1: 769.0. Samples: 2621698. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:22:36,948][117662] Avg episode reward: [(0, '11.120'), (1, '2.590')] [2023-09-30 12:22:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10518528. Throughput: 0: 768.0, 1: 770.6. Samples: 2626579. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:22:41,947][117662] Avg episode reward: [(0, '11.260'), (1, '2.580')] [2023-09-30 12:22:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10551296. Throughput: 0: 773.8, 1: 774.0. Samples: 2635797. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:22:46,947][117662] Avg episode reward: [(0, '11.280'), (1, '2.620')] [2023-09-30 12:22:49,163][118531] Updated weights for policy 1, policy_version 20640 (0.0017) [2023-09-30 12:22:49,163][118532] Updated weights for policy 0, policy_version 20640 (0.0018) [2023-09-30 12:22:51,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10584064. Throughput: 0: 776.4, 1: 773.5. Samples: 2645412. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:22:51,948][117662] Avg episode reward: [(0, '11.110'), (1, '2.640')] [2023-09-30 12:22:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 10608640. Throughput: 0: 777.0, 1: 776.0. Samples: 2650112. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-30 12:22:56,948][117662] Avg episode reward: [(0, '11.370'), (1, '2.630')] [2023-09-30 12:23:01,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10641408. Throughput: 0: 778.9, 1: 778.6. Samples: 2659423. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:01,947][117662] Avg episode reward: [(0, '11.520'), (1, '2.630')] [2023-09-30 12:23:02,389][118531] Updated weights for policy 1, policy_version 20800 (0.0018) [2023-09-30 12:23:02,391][118532] Updated weights for policy 0, policy_version 20800 (0.0018) [2023-09-30 12:23:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6212.3, 300 sec: 6220.4). Total num frames: 10674176. Throughput: 0: 776.6, 1: 775.6. Samples: 2668661. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:06,948][117662] Avg episode reward: [(0, '11.720'), (1, '2.630')] [2023-09-30 12:23:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10706944. Throughput: 0: 781.7, 1: 781.8. Samples: 2673372. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:11,948][117662] Avg episode reward: [(0, '11.840'), (1, '2.640')] [2023-09-30 12:23:15,457][118532] Updated weights for policy 0, policy_version 20960 (0.0018) [2023-09-30 12:23:15,457][118531] Updated weights for policy 1, policy_version 20960 (0.0018) [2023-09-30 12:23:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10739712. Throughput: 0: 782.3, 1: 781.9. Samples: 2682880. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:16,948][117662] Avg episode reward: [(0, '12.310'), (1, '2.610')] [2023-09-30 12:23:16,957][118358] Saving new best policy, reward=12.310! [2023-09-30 12:23:21,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10772480. Throughput: 0: 784.4, 1: 784.7. Samples: 2692309. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:21,948][117662] Avg episode reward: [(0, '11.640'), (1, '2.620')] [2023-09-30 12:23:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6220.4). Total num frames: 10797056. Throughput: 0: 784.7, 1: 784.4. Samples: 2697192. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:26,948][117662] Avg episode reward: [(0, '11.310'), (1, '2.620')] [2023-09-30 12:23:28,562][118531] Updated weights for policy 1, policy_version 21120 (0.0019) [2023-09-30 12:23:28,563][118532] Updated weights for policy 0, policy_version 21120 (0.0016) [2023-09-30 12:23:31,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10829824. Throughput: 0: 782.3, 1: 782.3. Samples: 2706207. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:31,947][117662] Avg episode reward: [(0, '11.180'), (1, '2.580')] [2023-09-30 12:23:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10862592. Throughput: 0: 780.3, 1: 780.2. Samples: 2715636. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:23:36,947][117662] Avg episode reward: [(0, '10.840'), (1, '2.590')] [2023-09-30 12:23:41,800][118532] Updated weights for policy 0, policy_version 21280 (0.0019) [2023-09-30 12:23:41,801][118531] Updated weights for policy 1, policy_version 21280 (0.0015) [2023-09-30 12:23:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10895360. Throughput: 0: 776.5, 1: 776.5. Samples: 2719994. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:23:41,947][117662] Avg episode reward: [(0, '10.660'), (1, '2.570')] [2023-09-30 12:23:46,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6212.3, 300 sec: 6234.2). Total num frames: 10924032. Throughput: 0: 781.1, 1: 781.3. Samples: 2729734. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:23:46,948][117662] Avg episode reward: [(0, '10.660'), (1, '2.590')] [2023-09-30 12:23:51,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10952704. Throughput: 0: 777.9, 1: 777.4. Samples: 2738648. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:23:51,948][117662] Avg episode reward: [(0, '10.730'), (1, '2.600')] [2023-09-30 12:23:54,985][118531] Updated weights for policy 1, policy_version 21440 (0.0017) [2023-09-30 12:23:54,985][118532] Updated weights for policy 0, policy_version 21440 (0.0017) [2023-09-30 12:23:56,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 10985472. Throughput: 0: 778.2, 1: 779.2. Samples: 2743451. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:23:56,948][117662] Avg episode reward: [(0, '10.650'), (1, '2.680')] [2023-09-30 12:24:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11018240. Throughput: 0: 773.7, 1: 773.8. Samples: 2752517. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:01,948][117662] Avg episode reward: [(0, '10.690'), (1, '2.700')] [2023-09-30 12:24:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11051008. Throughput: 0: 775.0, 1: 774.5. Samples: 2762035. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:06,948][117662] Avg episode reward: [(0, '10.590'), (1, '2.700')] [2023-09-30 12:24:08,268][118532] Updated weights for policy 0, policy_version 21600 (0.0016) [2023-09-30 12:24:08,268][118531] Updated weights for policy 1, policy_version 21600 (0.0018) [2023-09-30 12:24:11,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 11075584. Throughput: 0: 773.4, 1: 772.7. Samples: 2766769. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:11,948][117662] Avg episode reward: [(0, '10.640'), (1, '2.630')] [2023-09-30 12:24:16,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 11108352. Throughput: 0: 778.1, 1: 778.0. Samples: 2776231. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:16,948][117662] Avg episode reward: [(0, '10.790'), (1, '2.580')] [2023-09-30 12:24:16,959][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000021696_5554176.pth... [2023-09-30 12:24:16,959][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000021696_5554176.pth... [2023-09-30 12:24:16,990][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000018800_4812800.pth [2023-09-30 12:24:16,999][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000018800_4812800.pth [2023-09-30 12:24:21,131][118531] Updated weights for policy 1, policy_version 21760 (0.0018) [2023-09-30 12:24:21,132][118532] Updated weights for policy 0, policy_version 21760 (0.0018) [2023-09-30 12:24:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 11141120. Throughput: 0: 777.2, 1: 777.5. Samples: 2785598. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:21,948][117662] Avg episode reward: [(0, '11.200'), (1, '2.520')] [2023-09-30 12:24:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11173888. Throughput: 0: 784.2, 1: 784.5. Samples: 2790588. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:26,948][117662] Avg episode reward: [(0, '10.860'), (1, '2.520')] [2023-09-30 12:24:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11206656. Throughput: 0: 776.9, 1: 776.2. Samples: 2799625. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:31,947][117662] Avg episode reward: [(0, '11.090'), (1, '2.500')] [2023-09-30 12:24:34,351][118532] Updated weights for policy 0, policy_version 21920 (0.0017) [2023-09-30 12:24:34,351][118531] Updated weights for policy 1, policy_version 21920 (0.0017) [2023-09-30 12:24:36,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6206.5). Total num frames: 11235328. Throughput: 0: 781.7, 1: 782.9. Samples: 2809055. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:36,948][117662] Avg episode reward: [(0, '11.180'), (1, '2.520')] [2023-09-30 12:24:41,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 11264000. Throughput: 0: 781.9, 1: 781.2. Samples: 2813793. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:24:41,947][117662] Avg episode reward: [(0, '11.330'), (1, '2.620')] [2023-09-30 12:24:46,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6220.4). Total num frames: 11296768. Throughput: 0: 783.1, 1: 783.2. Samples: 2822999. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:24:46,947][117662] Avg episode reward: [(0, '11.390'), (1, '2.670')] [2023-09-30 12:24:47,489][118531] Updated weights for policy 1, policy_version 22080 (0.0018) [2023-09-30 12:24:47,490][118532] Updated weights for policy 0, policy_version 22080 (0.0018) [2023-09-30 12:24:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11329536. Throughput: 0: 781.6, 1: 781.7. Samples: 2832384. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:24:51,947][117662] Avg episode reward: [(0, '11.550'), (1, '2.660')] [2023-09-30 12:24:56,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11362304. Throughput: 0: 781.1, 1: 780.7. Samples: 2837048. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:24:56,948][117662] Avg episode reward: [(0, '11.420'), (1, '2.630')] [2023-09-30 12:25:00,513][118532] Updated weights for policy 0, policy_version 22240 (0.0017) [2023-09-30 12:25:00,514][118531] Updated weights for policy 1, policy_version 22240 (0.0015) [2023-09-30 12:25:01,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11395072. Throughput: 0: 783.3, 1: 783.2. Samples: 2846720. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:01,948][117662] Avg episode reward: [(0, '11.400'), (1, '2.610')] [2023-09-30 12:25:06,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 11419648. Throughput: 0: 778.6, 1: 779.2. Samples: 2855699. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:06,947][117662] Avg episode reward: [(0, '11.420'), (1, '2.610')] [2023-09-30 12:25:11,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 11452416. Throughput: 0: 774.4, 1: 774.4. Samples: 2860283. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:11,947][117662] Avg episode reward: [(0, '11.580'), (1, '2.630')] [2023-09-30 12:25:13,887][118531] Updated weights for policy 1, policy_version 22400 (0.0018) [2023-09-30 12:25:13,888][118532] Updated weights for policy 0, policy_version 22400 (0.0018) [2023-09-30 12:25:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 11485184. Throughput: 0: 776.6, 1: 776.9. Samples: 2869532. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:16,947][117662] Avg episode reward: [(0, '11.610'), (1, '2.700')] [2023-09-30 12:25:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 11517952. Throughput: 0: 780.1, 1: 779.5. Samples: 2879237. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:21,947][117662] Avg episode reward: [(0, '11.290'), (1, '2.760')] [2023-09-30 12:25:26,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 11542528. Throughput: 0: 775.7, 1: 775.2. Samples: 2883585. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:26,948][117662] Avg episode reward: [(0, '11.500'), (1, '2.820')] [2023-09-30 12:25:27,111][118532] Updated weights for policy 0, policy_version 22560 (0.0018) [2023-09-30 12:25:27,112][118531] Updated weights for policy 1, policy_version 22560 (0.0016) [2023-09-30 12:25:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6206.5). Total num frames: 11575296. Throughput: 0: 776.5, 1: 777.1. Samples: 2892912. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:31,947][117662] Avg episode reward: [(0, '11.530'), (1, '2.820')] [2023-09-30 12:25:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6212.3, 300 sec: 6220.4). Total num frames: 11608064. Throughput: 0: 776.6, 1: 776.8. Samples: 2902289. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:36,948][117662] Avg episode reward: [(0, '11.560'), (1, '2.780')] [2023-09-30 12:25:40,005][118532] Updated weights for policy 0, policy_version 22720 (0.0016) [2023-09-30 12:25:40,005][118531] Updated weights for policy 1, policy_version 22720 (0.0015) [2023-09-30 12:25:41,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11640832. Throughput: 0: 779.1, 1: 780.3. Samples: 2907220. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:41,948][117662] Avg episode reward: [(0, '11.800'), (1, '2.710')] [2023-09-30 12:25:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11673600. Throughput: 0: 774.4, 1: 774.6. Samples: 2916421. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:25:46,948][117662] Avg episode reward: [(0, '12.050'), (1, '2.720')] [2023-09-30 12:25:51,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11706368. Throughput: 0: 781.5, 1: 780.7. Samples: 2925997. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:25:51,948][117662] Avg episode reward: [(0, '12.390'), (1, '2.700')] [2023-09-30 12:25:51,949][118358] Saving new best policy, reward=12.390! [2023-09-30 12:25:53,101][118532] Updated weights for policy 0, policy_version 22880 (0.0019) [2023-09-30 12:25:53,101][118531] Updated weights for policy 1, policy_version 22880 (0.0015) [2023-09-30 12:25:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11739136. Throughput: 0: 782.4, 1: 782.2. Samples: 2930690. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:25:56,948][117662] Avg episode reward: [(0, '12.330'), (1, '2.700')] [2023-09-30 12:26:01,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 11763712. Throughput: 0: 785.4, 1: 785.3. Samples: 2940217. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:26:01,948][117662] Avg episode reward: [(0, '12.530'), (1, '2.680')] [2023-09-30 12:26:01,956][118358] Saving new best policy, reward=12.530! [2023-09-30 12:26:06,110][118532] Updated weights for policy 0, policy_version 23040 (0.0019) [2023-09-30 12:26:06,111][118531] Updated weights for policy 1, policy_version 23040 (0.0018) [2023-09-30 12:26:06,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11796480. Throughput: 0: 780.0, 1: 780.6. Samples: 2949464. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:06,947][117662] Avg episode reward: [(0, '12.670'), (1, '2.650')] [2023-09-30 12:26:06,948][118358] Saving new best policy, reward=12.670! [2023-09-30 12:26:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11829248. Throughput: 0: 782.3, 1: 782.3. Samples: 2953994. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:11,948][117662] Avg episode reward: [(0, '12.510'), (1, '2.680')] [2023-09-30 12:26:16,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11862016. Throughput: 0: 784.2, 1: 783.4. Samples: 2963456. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:16,948][117662] Avg episode reward: [(0, '12.730'), (1, '2.650')] [2023-09-30 12:26:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000023168_5931008.pth... [2023-09-30 12:26:16,958][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000023168_5931008.pth... [2023-09-30 12:26:16,992][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000020240_5181440.pth [2023-09-30 12:26:16,994][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000020240_5181440.pth [2023-09-30 12:26:16,995][118358] Saving new best policy, reward=12.730! [2023-09-30 12:26:19,297][118532] Updated weights for policy 0, policy_version 23200 (0.0014) [2023-09-30 12:26:19,298][118531] Updated weights for policy 1, policy_version 23200 (0.0017) [2023-09-30 12:26:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 11894784. Throughput: 0: 786.5, 1: 787.4. Samples: 2973112. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:21,948][117662] Avg episode reward: [(0, '12.710'), (1, '2.750')] [2023-09-30 12:26:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11919360. Throughput: 0: 784.4, 1: 783.8. Samples: 2977792. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:26,948][117662] Avg episode reward: [(0, '12.910'), (1, '2.730')] [2023-09-30 12:26:27,063][118358] Saving new best policy, reward=12.910! [2023-09-30 12:26:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 11952128. Throughput: 0: 787.0, 1: 787.2. Samples: 2987261. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:31,948][117662] Avg episode reward: [(0, '12.880'), (1, '2.760')] [2023-09-30 12:26:32,306][118532] Updated weights for policy 0, policy_version 23360 (0.0015) [2023-09-30 12:26:32,307][118531] Updated weights for policy 1, policy_version 23360 (0.0018) [2023-09-30 12:26:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 11984896. Throughput: 0: 780.6, 1: 780.8. Samples: 2996258. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:36,947][117662] Avg episode reward: [(0, '12.750'), (1, '2.750')] [2023-09-30 12:26:41,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 12017664. Throughput: 0: 782.8, 1: 783.0. Samples: 3001153. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:41,947][117662] Avg episode reward: [(0, '12.770'), (1, '2.740')] [2023-09-30 12:26:45,455][118531] Updated weights for policy 1, policy_version 23520 (0.0016) [2023-09-30 12:26:45,456][118532] Updated weights for policy 0, policy_version 23520 (0.0016) [2023-09-30 12:26:46,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12050432. Throughput: 0: 781.7, 1: 781.5. Samples: 3010560. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:46,948][117662] Avg episode reward: [(0, '12.910'), (1, '2.770')] [2023-09-30 12:26:51,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12075008. Throughput: 0: 782.2, 1: 782.0. Samples: 3019850. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:51,948][117662] Avg episode reward: [(0, '12.800'), (1, '2.750')] [2023-09-30 12:26:56,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12107776. Throughput: 0: 785.1, 1: 786.1. Samples: 3024698. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:26:56,948][117662] Avg episode reward: [(0, '13.030'), (1, '2.730')] [2023-09-30 12:26:56,949][118358] Saving new best policy, reward=13.030! [2023-09-30 12:26:58,787][118531] Updated weights for policy 1, policy_version 23680 (0.0017) [2023-09-30 12:26:58,787][118532] Updated weights for policy 0, policy_version 23680 (0.0018) [2023-09-30 12:27:01,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6234.3). Total num frames: 12140544. Throughput: 0: 775.1, 1: 775.6. Samples: 3033238. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:01,947][117662] Avg episode reward: [(0, '13.220'), (1, '2.730')] [2023-09-30 12:27:01,955][118358] Saving new best policy, reward=13.220! [2023-09-30 12:27:06,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12173312. Throughput: 0: 772.6, 1: 772.0. Samples: 3042618. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:06,947][117662] Avg episode reward: [(0, '13.250'), (1, '2.760')] [2023-09-30 12:27:06,948][118358] Saving new best policy, reward=13.250! [2023-09-30 12:27:11,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12197888. Throughput: 0: 771.6, 1: 771.6. Samples: 3047237. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:11,947][117662] Avg episode reward: [(0, '13.530'), (1, '2.790')] [2023-09-30 12:27:11,948][118358] Saving new best policy, reward=13.530! [2023-09-30 12:27:12,213][118532] Updated weights for policy 0, policy_version 23840 (0.0017) [2023-09-30 12:27:12,213][118531] Updated weights for policy 1, policy_version 23840 (0.0016) [2023-09-30 12:27:16,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12230656. Throughput: 0: 771.9, 1: 772.2. Samples: 3056743. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:16,948][117662] Avg episode reward: [(0, '13.510'), (1, '2.780')] [2023-09-30 12:27:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6234.3). Total num frames: 12263424. Throughput: 0: 773.5, 1: 773.3. Samples: 3065862. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:21,947][117662] Avg episode reward: [(0, '13.750'), (1, '2.830')] [2023-09-30 12:27:21,948][118358] Saving new best policy, reward=13.750! [2023-09-30 12:27:25,319][118532] Updated weights for policy 0, policy_version 24000 (0.0019) [2023-09-30 12:27:25,319][118531] Updated weights for policy 1, policy_version 24000 (0.0018) [2023-09-30 12:27:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12296192. Throughput: 0: 769.6, 1: 769.8. Samples: 3070422. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:27:26,948][117662] Avg episode reward: [(0, '13.870'), (1, '2.850')] [2023-09-30 12:27:26,949][118358] Saving new best policy, reward=13.870! [2023-09-30 12:27:31,947][117662] Fps is (10 sec: 6553.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12328960. Throughput: 0: 773.5, 1: 772.3. Samples: 3080119. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:27:31,948][117662] Avg episode reward: [(0, '14.040'), (1, '2.890')] [2023-09-30 12:27:31,959][118358] Saving new best policy, reward=14.040! [2023-09-30 12:27:36,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12353536. Throughput: 0: 768.4, 1: 768.5. Samples: 3089011. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:27:36,947][117662] Avg episode reward: [(0, '14.040'), (1, '2.920')] [2023-09-30 12:27:38,557][118532] Updated weights for policy 0, policy_version 24160 (0.0018) [2023-09-30 12:27:38,557][118531] Updated weights for policy 1, policy_version 24160 (0.0017) [2023-09-30 12:27:41,947][117662] Fps is (10 sec: 5734.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12386304. Throughput: 0: 769.8, 1: 768.6. Samples: 3093928. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:27:41,947][117662] Avg episode reward: [(0, '13.610'), (1, '2.910')] [2023-09-30 12:27:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12419072. Throughput: 0: 778.6, 1: 778.5. Samples: 3103311. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:46,947][117662] Avg episode reward: [(0, '13.390'), (1, '2.920')] [2023-09-30 12:27:51,582][118531] Updated weights for policy 1, policy_version 24320 (0.0015) [2023-09-30 12:27:51,582][118532] Updated weights for policy 0, policy_version 24320 (0.0019) [2023-09-30 12:27:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 12451840. Throughput: 0: 780.5, 1: 781.4. Samples: 3112904. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:51,947][117662] Avg episode reward: [(0, '13.280'), (1, '2.930')] [2023-09-30 12:27:56,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12484608. Throughput: 0: 776.3, 1: 776.6. Samples: 3117119. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:27:56,948][117662] Avg episode reward: [(0, '13.270'), (1, '2.900')] [2023-09-30 12:28:01,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12509184. Throughput: 0: 780.5, 1: 780.1. Samples: 3126973. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:01,948][117662] Avg episode reward: [(0, '13.330'), (1, '2.910')] [2023-09-30 12:28:04,698][118531] Updated weights for policy 1, policy_version 24480 (0.0017) [2023-09-30 12:28:04,698][118532] Updated weights for policy 0, policy_version 24480 (0.0017) [2023-09-30 12:28:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12541952. Throughput: 0: 779.9, 1: 780.5. Samples: 3136081. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:06,948][117662] Avg episode reward: [(0, '13.540'), (1, '2.870')] [2023-09-30 12:28:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 12574720. Throughput: 0: 783.8, 1: 784.4. Samples: 3140993. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:11,947][117662] Avg episode reward: [(0, '13.610'), (1, '2.880')] [2023-09-30 12:28:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 12607488. Throughput: 0: 781.3, 1: 782.6. Samples: 3150495. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:16,948][117662] Avg episode reward: [(0, '13.380'), (1, '2.850')] [2023-09-30 12:28:16,959][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000024624_6303744.pth... [2023-09-30 12:28:16,959][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000024624_6303744.pth... [2023-09-30 12:28:16,997][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000021696_5554176.pth [2023-09-30 12:28:17,001][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000021696_5554176.pth [2023-09-30 12:28:17,570][118531] Updated weights for policy 1, policy_version 24640 (0.0019) [2023-09-30 12:28:17,572][118532] Updated weights for policy 0, policy_version 24640 (0.0019) [2023-09-30 12:28:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12640256. Throughput: 0: 788.7, 1: 789.2. Samples: 3160019. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:21,947][117662] Avg episode reward: [(0, '13.240'), (1, '2.830')] [2023-09-30 12:28:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12673024. Throughput: 0: 780.2, 1: 780.5. Samples: 3164161. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:26,948][117662] Avg episode reward: [(0, '13.270'), (1, '2.800')] [2023-09-30 12:28:30,916][118532] Updated weights for policy 0, policy_version 24800 (0.0018) [2023-09-30 12:28:30,916][118531] Updated weights for policy 1, policy_version 24800 (0.0018) [2023-09-30 12:28:31,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12697600. Throughput: 0: 782.5, 1: 782.0. Samples: 3173714. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:31,947][117662] Avg episode reward: [(0, '12.940'), (1, '2.710')] [2023-09-30 12:28:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 12730368. Throughput: 0: 776.8, 1: 775.8. Samples: 3182771. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:36,948][117662] Avg episode reward: [(0, '12.950'), (1, '2.720')] [2023-09-30 12:28:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6234.3). Total num frames: 12763136. Throughput: 0: 782.5, 1: 782.6. Samples: 3187547. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:41,948][117662] Avg episode reward: [(0, '13.110'), (1, '2.700')] [2023-09-30 12:28:44,196][118532] Updated weights for policy 0, policy_version 24960 (0.0018) [2023-09-30 12:28:44,196][118531] Updated weights for policy 1, policy_version 24960 (0.0018) [2023-09-30 12:28:46,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 12795904. Throughput: 0: 777.1, 1: 777.1. Samples: 3196912. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:46,948][117662] Avg episode reward: [(0, '13.040'), (1, '2.720')] [2023-09-30 12:28:51,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12820480. Throughput: 0: 776.0, 1: 774.8. Samples: 3205870. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:28:51,948][117662] Avg episode reward: [(0, '12.910'), (1, '2.740')] [2023-09-30 12:28:56,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12853248. Throughput: 0: 775.3, 1: 774.7. Samples: 3210745. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:28:56,948][117662] Avg episode reward: [(0, '12.820'), (1, '2.790')] [2023-09-30 12:28:57,404][118532] Updated weights for policy 0, policy_version 25120 (0.0017) [2023-09-30 12:28:57,404][118531] Updated weights for policy 1, policy_version 25120 (0.0018) [2023-09-30 12:29:01,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 12886016. Throughput: 0: 769.9, 1: 770.0. Samples: 3219791. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:29:01,947][117662] Avg episode reward: [(0, '12.630'), (1, '2.810')] [2023-09-30 12:29:06,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 12918784. Throughput: 0: 769.1, 1: 768.1. Samples: 3229191. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:29:06,947][117662] Avg episode reward: [(0, '12.480'), (1, '2.820')] [2023-09-30 12:29:10,811][118531] Updated weights for policy 1, policy_version 25280 (0.0018) [2023-09-30 12:29:10,811][118532] Updated weights for policy 0, policy_version 25280 (0.0018) [2023-09-30 12:29:11,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12943360. Throughput: 0: 773.7, 1: 773.7. Samples: 3233792. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:29:11,948][117662] Avg episode reward: [(0, '12.110'), (1, '2.850')] [2023-09-30 12:29:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 12976128. Throughput: 0: 773.0, 1: 772.1. Samples: 3243245. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:29:16,947][117662] Avg episode reward: [(0, '11.800'), (1, '2.830')] [2023-09-30 12:29:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13008896. Throughput: 0: 771.8, 1: 771.6. Samples: 3252224. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:29:21,948][117662] Avg episode reward: [(0, '11.690'), (1, '2.810')] [2023-09-30 12:29:24,118][118532] Updated weights for policy 0, policy_version 25440 (0.0018) [2023-09-30 12:29:24,118][118531] Updated weights for policy 1, policy_version 25440 (0.0019) [2023-09-30 12:29:26,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13041664. Throughput: 0: 767.6, 1: 767.3. Samples: 3256620. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:29:26,947][117662] Avg episode reward: [(0, '11.710'), (1, '2.810')] [2023-09-30 12:29:31,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6234.2). Total num frames: 13074432. Throughput: 0: 771.6, 1: 772.1. Samples: 3266379. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:29:31,948][117662] Avg episode reward: [(0, '12.090'), (1, '2.830')] [2023-09-30 12:29:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13099008. Throughput: 0: 774.2, 1: 774.7. Samples: 3275571. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:29:36,947][117662] Avg episode reward: [(0, '12.060'), (1, '2.850')] [2023-09-30 12:29:37,174][118531] Updated weights for policy 1, policy_version 25600 (0.0017) [2023-09-30 12:29:37,174][118532] Updated weights for policy 0, policy_version 25600 (0.0017) [2023-09-30 12:29:41,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13131776. Throughput: 0: 772.2, 1: 772.5. Samples: 3280255. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:29:41,948][117662] Avg episode reward: [(0, '12.000'), (1, '2.830')] [2023-09-30 12:29:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13164544. Throughput: 0: 774.9, 1: 775.0. Samples: 3289537. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:29:46,947][117662] Avg episode reward: [(0, '12.140'), (1, '2.830')] [2023-09-30 12:29:50,431][118532] Updated weights for policy 0, policy_version 25760 (0.0019) [2023-09-30 12:29:50,431][118531] Updated weights for policy 1, policy_version 25760 (0.0017) [2023-09-30 12:29:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 13197312. Throughput: 0: 773.3, 1: 774.3. Samples: 3298833. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:29:51,948][117662] Avg episode reward: [(0, '12.070'), (1, '2.830')] [2023-09-30 12:29:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13221888. Throughput: 0: 773.7, 1: 773.7. Samples: 3303424. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:29:56,948][117662] Avg episode reward: [(0, '12.120'), (1, '2.850')] [2023-09-30 12:30:01,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13254656. Throughput: 0: 771.2, 1: 772.0. Samples: 3312686. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:30:01,947][117662] Avg episode reward: [(0, '12.170'), (1, '2.770')] [2023-09-30 12:30:03,642][118532] Updated weights for policy 0, policy_version 25920 (0.0020) [2023-09-30 12:30:03,643][118531] Updated weights for policy 1, policy_version 25920 (0.0018) [2023-09-30 12:30:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13287424. Throughput: 0: 775.6, 1: 775.7. Samples: 3322031. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:30:06,948][117662] Avg episode reward: [(0, '12.100'), (1, '2.760')] [2023-09-30 12:30:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 13320192. Throughput: 0: 779.8, 1: 780.4. Samples: 3326830. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:30:11,947][117662] Avg episode reward: [(0, '12.340'), (1, '2.680')] [2023-09-30 12:30:16,736][118532] Updated weights for policy 0, policy_version 26080 (0.0017) [2023-09-30 12:30:16,736][118531] Updated weights for policy 1, policy_version 26080 (0.0018) [2023-09-30 12:30:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 13352960. Throughput: 0: 776.1, 1: 775.4. Samples: 3336195. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-30 12:30:16,948][117662] Avg episode reward: [(0, '12.360'), (1, '2.700')] [2023-09-30 12:30:16,960][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000026080_6676480.pth... [2023-09-30 12:30:16,960][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000026080_6676480.pth... [2023-09-30 12:30:16,993][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000023168_5931008.pth [2023-09-30 12:30:16,996][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000023168_5931008.pth [2023-09-30 12:30:21,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 13385728. Throughput: 0: 778.2, 1: 778.9. Samples: 3345641. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:30:21,948][117662] Avg episode reward: [(0, '12.140'), (1, '2.670')] [2023-09-30 12:30:26,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13410304. Throughput: 0: 778.7, 1: 778.3. Samples: 3350321. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:30:26,948][117662] Avg episode reward: [(0, '12.040'), (1, '2.630')] [2023-09-30 12:30:30,105][118531] Updated weights for policy 1, policy_version 26240 (0.0017) [2023-09-30 12:30:30,105][118532] Updated weights for policy 0, policy_version 26240 (0.0017) [2023-09-30 12:30:31,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13443072. Throughput: 0: 772.7, 1: 772.6. Samples: 3359077. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:30:31,947][117662] Avg episode reward: [(0, '11.900'), (1, '2.650')] [2023-09-30 12:30:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 13475840. Throughput: 0: 775.2, 1: 774.0. Samples: 3368550. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:30:36,947][117662] Avg episode reward: [(0, '11.990'), (1, '2.690')] [2023-09-30 12:30:41,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13500416. Throughput: 0: 773.7, 1: 773.7. Samples: 3373056. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:30:41,948][117662] Avg episode reward: [(0, '12.160'), (1, '2.730')] [2023-09-30 12:30:43,458][118531] Updated weights for policy 1, policy_version 26400 (0.0019) [2023-09-30 12:30:43,458][118532] Updated weights for policy 0, policy_version 26400 (0.0019) [2023-09-30 12:30:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13533184. Throughput: 0: 768.7, 1: 769.9. Samples: 3381924. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:30:46,947][117662] Avg episode reward: [(0, '12.340'), (1, '2.740')] [2023-09-30 12:30:51,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13565952. Throughput: 0: 768.6, 1: 769.5. Samples: 3391246. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:30:51,947][117662] Avg episode reward: [(0, '12.580'), (1, '2.740')] [2023-09-30 12:30:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13590528. Throughput: 0: 764.3, 1: 763.8. Samples: 3395593. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:30:56,948][117662] Avg episode reward: [(0, '12.670'), (1, '2.760')] [2023-09-30 12:30:57,083][118532] Updated weights for policy 0, policy_version 26560 (0.0017) [2023-09-30 12:30:57,083][118531] Updated weights for policy 1, policy_version 26560 (0.0017) [2023-09-30 12:31:01,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13623296. Throughput: 0: 763.5, 1: 763.9. Samples: 3404925. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:01,947][117662] Avg episode reward: [(0, '12.550'), (1, '2.770')] [2023-09-30 12:31:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13656064. Throughput: 0: 763.4, 1: 763.1. Samples: 3414334. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:06,948][117662] Avg episode reward: [(0, '12.630'), (1, '2.780')] [2023-09-30 12:31:10,048][118531] Updated weights for policy 1, policy_version 26720 (0.0018) [2023-09-30 12:31:10,048][118532] Updated weights for policy 0, policy_version 26720 (0.0017) [2023-09-30 12:31:11,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13688832. Throughput: 0: 765.5, 1: 765.4. Samples: 3419211. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:11,948][117662] Avg episode reward: [(0, '12.920'), (1, '2.770')] [2023-09-30 12:31:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13721600. Throughput: 0: 769.9, 1: 769.7. Samples: 3428357. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:16,948][117662] Avg episode reward: [(0, '12.970'), (1, '2.810')] [2023-09-30 12:31:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 13754368. Throughput: 0: 770.3, 1: 770.6. Samples: 3437888. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:21,947][117662] Avg episode reward: [(0, '13.260'), (1, '2.790')] [2023-09-30 12:31:23,187][118532] Updated weights for policy 0, policy_version 26880 (0.0017) [2023-09-30 12:31:23,188][118531] Updated weights for policy 1, policy_version 26880 (0.0015) [2023-09-30 12:31:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13778944. Throughput: 0: 773.7, 1: 773.7. Samples: 3442688. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:26,948][117662] Avg episode reward: [(0, '13.500'), (1, '2.700')] [2023-09-30 12:31:31,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13811712. Throughput: 0: 773.3, 1: 773.0. Samples: 3451509. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:31:31,948][117662] Avg episode reward: [(0, '13.530'), (1, '2.700')] [2023-09-30 12:31:36,440][118532] Updated weights for policy 0, policy_version 27040 (0.0014) [2023-09-30 12:31:36,440][118531] Updated weights for policy 1, policy_version 27040 (0.0017) [2023-09-30 12:31:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13844480. Throughput: 0: 776.9, 1: 775.9. Samples: 3461120. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:31:36,947][117662] Avg episode reward: [(0, '13.320'), (1, '2.710')] [2023-09-30 12:31:41,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 13877248. Throughput: 0: 778.5, 1: 778.8. Samples: 3465670. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:31:41,947][117662] Avg episode reward: [(0, '13.360'), (1, '2.690')] [2023-09-30 12:31:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13901824. Throughput: 0: 780.2, 1: 779.7. Samples: 3475120. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:31:46,948][117662] Avg episode reward: [(0, '13.390'), (1, '2.680')] [2023-09-30 12:31:49,606][118532] Updated weights for policy 0, policy_version 27200 (0.0018) [2023-09-30 12:31:49,606][118531] Updated weights for policy 1, policy_version 27200 (0.0017) [2023-09-30 12:31:51,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 13934592. Throughput: 0: 779.3, 1: 778.4. Samples: 3484430. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:51,947][117662] Avg episode reward: [(0, '13.620'), (1, '2.740')] [2023-09-30 12:31:56,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 13967360. Throughput: 0: 779.7, 1: 779.7. Samples: 3489386. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:31:56,948][117662] Avg episode reward: [(0, '13.740'), (1, '2.790')] [2023-09-30 12:32:01,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14000128. Throughput: 0: 779.9, 1: 779.7. Samples: 3498541. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:01,948][117662] Avg episode reward: [(0, '13.820'), (1, '2.760')] [2023-09-30 12:32:02,662][118532] Updated weights for policy 0, policy_version 27360 (0.0014) [2023-09-30 12:32:02,662][118531] Updated weights for policy 1, policy_version 27360 (0.0018) [2023-09-30 12:32:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14032896. Throughput: 0: 781.5, 1: 781.5. Samples: 3508220. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:06,948][117662] Avg episode reward: [(0, '13.810'), (1, '2.750')] [2023-09-30 12:32:11,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14065664. Throughput: 0: 774.5, 1: 774.8. Samples: 3512406. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:11,947][117662] Avg episode reward: [(0, '13.300'), (1, '2.770')] [2023-09-30 12:32:15,837][118532] Updated weights for policy 0, policy_version 27520 (0.0019) [2023-09-30 12:32:15,837][118531] Updated weights for policy 1, policy_version 27520 (0.0019) [2023-09-30 12:32:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14090240. Throughput: 0: 782.2, 1: 782.8. Samples: 3521934. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:16,948][117662] Avg episode reward: [(0, '13.060'), (1, '2.730')] [2023-09-30 12:32:17,089][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000027536_7049216.pth... [2023-09-30 12:32:17,106][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000027536_7049216.pth... [2023-09-30 12:32:17,118][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000024624_6303744.pth [2023-09-30 12:32:17,134][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000024624_6303744.pth [2023-09-30 12:32:21,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14123008. Throughput: 0: 778.1, 1: 778.4. Samples: 3531163. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:21,947][117662] Avg episode reward: [(0, '13.490'), (1, '2.700')] [2023-09-30 12:32:26,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14155776. Throughput: 0: 774.7, 1: 775.8. Samples: 3535442. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:26,948][117662] Avg episode reward: [(0, '13.340'), (1, '2.710')] [2023-09-30 12:32:29,351][118531] Updated weights for policy 1, policy_version 27680 (0.0018) [2023-09-30 12:32:29,351][118532] Updated weights for policy 0, policy_version 27680 (0.0018) [2023-09-30 12:32:31,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 14188544. Throughput: 0: 775.2, 1: 775.5. Samples: 3544902. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:31,947][117662] Avg episode reward: [(0, '13.510'), (1, '2.780')] [2023-09-30 12:32:36,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14213120. Throughput: 0: 772.4, 1: 773.7. Samples: 3554008. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:36,947][117662] Avg episode reward: [(0, '13.750'), (1, '2.760')] [2023-09-30 12:32:41,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14245888. Throughput: 0: 771.6, 1: 771.5. Samples: 3558823. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:32:41,947][117662] Avg episode reward: [(0, '13.760'), (1, '2.700')] [2023-09-30 12:32:42,457][118532] Updated weights for policy 0, policy_version 27840 (0.0017) [2023-09-30 12:32:42,458][118531] Updated weights for policy 1, policy_version 27840 (0.0017) [2023-09-30 12:32:46,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14278656. Throughput: 0: 773.3, 1: 773.1. Samples: 3568131. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:32:46,948][117662] Avg episode reward: [(0, '13.580'), (1, '2.700')] [2023-09-30 12:32:51,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14311424. Throughput: 0: 771.4, 1: 768.8. Samples: 3577529. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:32:51,948][117662] Avg episode reward: [(0, '12.960'), (1, '2.640')] [2023-09-30 12:32:55,682][118531] Updated weights for policy 1, policy_version 28000 (0.0018) [2023-09-30 12:32:55,682][118532] Updated weights for policy 0, policy_version 28000 (0.0017) [2023-09-30 12:32:56,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14336000. Throughput: 0: 772.8, 1: 772.6. Samples: 3581952. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:32:56,948][117662] Avg episode reward: [(0, '13.060'), (1, '2.530')] [2023-09-30 12:33:01,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14368768. Throughput: 0: 774.1, 1: 773.3. Samples: 3591570. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:33:01,948][117662] Avg episode reward: [(0, '13.070'), (1, '2.440')] [2023-09-30 12:33:06,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14401536. Throughput: 0: 776.1, 1: 776.0. Samples: 3601008. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:33:06,947][117662] Avg episode reward: [(0, '13.110'), (1, '2.430')] [2023-09-30 12:33:08,579][118531] Updated weights for policy 1, policy_version 28160 (0.0014) [2023-09-30 12:33:08,580][118532] Updated weights for policy 0, policy_version 28160 (0.0019) [2023-09-30 12:33:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14434304. Throughput: 0: 782.9, 1: 780.9. Samples: 3605811. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:33:11,947][117662] Avg episode reward: [(0, '13.280'), (1, '2.510')] [2023-09-30 12:33:16,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14467072. Throughput: 0: 776.1, 1: 776.2. Samples: 3614756. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:33:16,948][117662] Avg episode reward: [(0, '13.470'), (1, '2.560')] [2023-09-30 12:33:21,639][118532] Updated weights for policy 0, policy_version 28320 (0.0017) [2023-09-30 12:33:21,639][118531] Updated weights for policy 1, policy_version 28320 (0.0018) [2023-09-30 12:33:21,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14499840. Throughput: 0: 786.1, 1: 785.6. Samples: 3624735. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:33:21,947][117662] Avg episode reward: [(0, '13.390'), (1, '2.660')] [2023-09-30 12:33:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14532608. Throughput: 0: 782.4, 1: 782.3. Samples: 3629234. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:33:26,947][117662] Avg episode reward: [(0, '13.530'), (1, '2.770')] [2023-09-30 12:33:31,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6212.2, 300 sec: 6206.5). Total num frames: 14561280. Throughput: 0: 787.7, 1: 788.4. Samples: 3639059. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:33:31,948][117662] Avg episode reward: [(0, '13.150'), (1, '2.740')] [2023-09-30 12:33:34,614][118532] Updated weights for policy 0, policy_version 28480 (0.0016) [2023-09-30 12:33:34,614][118531] Updated weights for policy 1, policy_version 28480 (0.0016) [2023-09-30 12:33:36,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14589952. Throughput: 0: 784.0, 1: 786.3. Samples: 3648194. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:33:36,948][117662] Avg episode reward: [(0, '12.970'), (1, '2.680')] [2023-09-30 12:33:41,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 14622720. Throughput: 0: 789.4, 1: 789.1. Samples: 3652983. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:33:41,948][117662] Avg episode reward: [(0, '13.150'), (1, '2.630')] [2023-09-30 12:33:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 14655488. Throughput: 0: 786.3, 1: 786.3. Samples: 3662337. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:33:46,947][117662] Avg episode reward: [(0, '13.150'), (1, '2.680')] [2023-09-30 12:33:47,627][118531] Updated weights for policy 1, policy_version 28640 (0.0018) [2023-09-30 12:33:47,627][118532] Updated weights for policy 0, policy_version 28640 (0.0016) [2023-09-30 12:33:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14688256. Throughput: 0: 785.8, 1: 787.4. Samples: 3671805. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:33:51,948][117662] Avg episode reward: [(0, '12.780'), (1, '2.660')] [2023-09-30 12:33:56,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6220.4). Total num frames: 14721024. Throughput: 0: 782.2, 1: 783.1. Samples: 3676251. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:33:56,948][117662] Avg episode reward: [(0, '12.570'), (1, '2.710')] [2023-09-30 12:34:00,931][118532] Updated weights for policy 0, policy_version 28800 (0.0017) [2023-09-30 12:34:00,932][118531] Updated weights for policy 1, policy_version 28800 (0.0017) [2023-09-30 12:34:01,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6192.6). Total num frames: 14745600. Throughput: 0: 788.2, 1: 787.5. Samples: 3685661. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:34:01,947][117662] Avg episode reward: [(0, '12.280'), (1, '2.750')] [2023-09-30 12:34:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14778368. Throughput: 0: 779.4, 1: 779.2. Samples: 3694873. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:34:06,948][117662] Avg episode reward: [(0, '12.370'), (1, '2.790')] [2023-09-30 12:34:11,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14811136. Throughput: 0: 781.0, 1: 780.3. Samples: 3699494. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:34:11,948][117662] Avg episode reward: [(0, '12.100'), (1, '2.840')] [2023-09-30 12:34:14,068][118531] Updated weights for policy 1, policy_version 28960 (0.0017) [2023-09-30 12:34:14,069][118532] Updated weights for policy 0, policy_version 28960 (0.0015) [2023-09-30 12:34:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14843904. Throughput: 0: 776.5, 1: 776.2. Samples: 3708928. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:34:16,947][117662] Avg episode reward: [(0, '12.050'), (1, '2.830')] [2023-09-30 12:34:16,957][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000028992_7421952.pth... [2023-09-30 12:34:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000028992_7421952.pth... [2023-09-30 12:34:16,994][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000026080_6676480.pth [2023-09-30 12:34:16,996][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000026080_6676480.pth [2023-09-30 12:34:21,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14868480. Throughput: 0: 777.5, 1: 778.5. Samples: 3718211. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:34:21,948][117662] Avg episode reward: [(0, '12.320'), (1, '2.880')] [2023-09-30 12:34:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 14901248. Throughput: 0: 778.1, 1: 778.5. Samples: 3723028. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:34:26,948][117662] Avg episode reward: [(0, '12.220'), (1, '2.860')] [2023-09-30 12:34:27,297][118531] Updated weights for policy 1, policy_version 29120 (0.0017) [2023-09-30 12:34:27,297][118532] Updated weights for policy 0, policy_version 29120 (0.0019) [2023-09-30 12:34:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6212.3, 300 sec: 6220.4). Total num frames: 14934016. Throughput: 0: 776.9, 1: 777.1. Samples: 3732269. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:34:31,947][117662] Avg episode reward: [(0, '12.210'), (1, '2.840')] [2023-09-30 12:34:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 14966784. Throughput: 0: 777.6, 1: 775.8. Samples: 3741708. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:34:36,948][117662] Avg episode reward: [(0, '12.030'), (1, '2.820')] [2023-09-30 12:34:40,248][118531] Updated weights for policy 1, policy_version 29280 (0.0015) [2023-09-30 12:34:40,248][118532] Updated weights for policy 0, policy_version 29280 (0.0018) [2023-09-30 12:34:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 14999552. Throughput: 0: 779.3, 1: 778.7. Samples: 3746363. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-30 12:34:41,947][117662] Avg episode reward: [(0, '12.260'), (1, '2.830')] [2023-09-30 12:34:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15032320. Throughput: 0: 781.7, 1: 782.1. Samples: 3756032. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:34:46,947][117662] Avg episode reward: [(0, '12.500'), (1, '2.840')] [2023-09-30 12:34:51,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15056896. Throughput: 0: 779.2, 1: 780.4. Samples: 3765057. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:34:51,947][117662] Avg episode reward: [(0, '12.480'), (1, '2.810')] [2023-09-30 12:34:53,653][118531] Updated weights for policy 1, policy_version 29440 (0.0016) [2023-09-30 12:34:53,653][118532] Updated weights for policy 0, policy_version 29440 (0.0017) [2023-09-30 12:34:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15089664. Throughput: 0: 777.6, 1: 779.0. Samples: 3769544. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:34:56,948][117662] Avg episode reward: [(0, '12.360'), (1, '2.820')] [2023-09-30 12:35:01,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15122432. Throughput: 0: 774.0, 1: 774.2. Samples: 3778599. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:01,947][117662] Avg episode reward: [(0, '12.070'), (1, '2.900')] [2023-09-30 12:35:06,744][118532] Updated weights for policy 0, policy_version 29600 (0.0017) [2023-09-30 12:35:06,744][118531] Updated weights for policy 1, policy_version 29600 (0.0017) [2023-09-30 12:35:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15155200. Throughput: 0: 780.9, 1: 779.6. Samples: 3788432. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:06,948][117662] Avg episode reward: [(0, '12.000'), (1, '2.870')] [2023-09-30 12:35:11,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 15187968. Throughput: 0: 776.5, 1: 776.3. Samples: 3792904. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:11,947][117662] Avg episode reward: [(0, '11.950'), (1, '2.840')] [2023-09-30 12:35:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 15212544. Throughput: 0: 779.2, 1: 779.1. Samples: 3802392. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:16,948][117662] Avg episode reward: [(0, '12.030'), (1, '2.800')] [2023-09-30 12:35:19,970][118531] Updated weights for policy 1, policy_version 29760 (0.0015) [2023-09-30 12:35:19,971][118532] Updated weights for policy 0, policy_version 29760 (0.0016) [2023-09-30 12:35:21,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15245312. Throughput: 0: 773.6, 1: 773.6. Samples: 3811330. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:21,947][117662] Avg episode reward: [(0, '11.970'), (1, '2.830')] [2023-09-30 12:35:26,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 15278080. Throughput: 0: 773.6, 1: 774.2. Samples: 3816014. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:26,947][117662] Avg episode reward: [(0, '12.360'), (1, '2.820')] [2023-09-30 12:35:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15310848. Throughput: 0: 773.7, 1: 773.7. Samples: 3825664. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:31,947][117662] Avg episode reward: [(0, '12.610'), (1, '2.800')] [2023-09-30 12:35:33,193][118531] Updated weights for policy 1, policy_version 29920 (0.0017) [2023-09-30 12:35:33,193][118532] Updated weights for policy 0, policy_version 29920 (0.0018) [2023-09-30 12:35:36,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15335424. Throughput: 0: 776.6, 1: 775.7. Samples: 3834910. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:36,948][117662] Avg episode reward: [(0, '12.520'), (1, '2.840')] [2023-09-30 12:35:41,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15368192. Throughput: 0: 778.3, 1: 777.5. Samples: 3839556. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:41,947][117662] Avg episode reward: [(0, '12.920'), (1, '2.860')] [2023-09-30 12:35:46,643][118532] Updated weights for policy 0, policy_version 30080 (0.0017) [2023-09-30 12:35:46,644][118531] Updated weights for policy 1, policy_version 30080 (0.0018) [2023-09-30 12:35:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15400960. Throughput: 0: 775.0, 1: 775.0. Samples: 3848351. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:46,947][117662] Avg episode reward: [(0, '13.060'), (1, '2.880')] [2023-09-30 12:35:51,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 15433728. Throughput: 0: 771.2, 1: 771.9. Samples: 3857873. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:51,948][117662] Avg episode reward: [(0, '13.260'), (1, '2.850')] [2023-09-30 12:35:56,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 15466496. Throughput: 0: 773.6, 1: 773.6. Samples: 3862529. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:35:56,948][117662] Avg episode reward: [(0, '13.390'), (1, '2.840')] [2023-09-30 12:35:59,527][118531] Updated weights for policy 1, policy_version 30240 (0.0018) [2023-09-30 12:35:59,527][118532] Updated weights for policy 0, policy_version 30240 (0.0017) [2023-09-30 12:36:01,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15491072. Throughput: 0: 776.1, 1: 776.2. Samples: 3872245. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:36:01,947][117662] Avg episode reward: [(0, '13.850'), (1, '2.850')] [2023-09-30 12:36:06,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15523840. Throughput: 0: 776.1, 1: 776.3. Samples: 3881187. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:36:06,947][117662] Avg episode reward: [(0, '13.860'), (1, '2.820')] [2023-09-30 12:36:11,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15556608. Throughput: 0: 778.9, 1: 778.3. Samples: 3886086. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:36:11,948][117662] Avg episode reward: [(0, '13.920'), (1, '2.810')] [2023-09-30 12:36:12,676][118531] Updated weights for policy 1, policy_version 30400 (0.0018) [2023-09-30 12:36:12,676][118532] Updated weights for policy 0, policy_version 30400 (0.0017) [2023-09-30 12:36:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15589376. Throughput: 0: 774.1, 1: 774.4. Samples: 3895349. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:36:16,947][117662] Avg episode reward: [(0, '14.090'), (1, '2.810')] [2023-09-30 12:36:16,956][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000030448_7794688.pth... [2023-09-30 12:36:16,956][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000030448_7794688.pth... [2023-09-30 12:36:16,992][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000027536_7049216.pth [2023-09-30 12:36:16,992][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000027536_7049216.pth [2023-09-30 12:36:16,996][118358] Saving new best policy, reward=14.090! [2023-09-30 12:36:21,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 15622144. Throughput: 0: 780.0, 1: 778.4. Samples: 3905034. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-30 12:36:21,947][117662] Avg episode reward: [(0, '14.340'), (1, '2.790')] [2023-09-30 12:36:21,948][118358] Saving new best policy, reward=14.340! [2023-09-30 12:36:25,746][118531] Updated weights for policy 1, policy_version 30560 (0.0017) [2023-09-30 12:36:25,747][118532] Updated weights for policy 0, policy_version 30560 (0.0017) [2023-09-30 12:36:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15646720. Throughput: 0: 778.7, 1: 778.6. Samples: 3909632. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:36:26,948][117662] Avg episode reward: [(0, '14.720'), (1, '2.780')] [2023-09-30 12:36:27,045][118358] Saving new best policy, reward=14.720! [2023-09-30 12:36:31,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15679488. Throughput: 0: 787.2, 1: 787.4. Samples: 3919209. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:36:31,947][117662] Avg episode reward: [(0, '15.140'), (1, '2.780')] [2023-09-30 12:36:31,955][118358] Saving new best policy, reward=15.140! [2023-09-30 12:36:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15712256. Throughput: 0: 782.3, 1: 782.8. Samples: 3928305. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:36:36,948][117662] Avg episode reward: [(0, '14.880'), (1, '2.840')] [2023-09-30 12:36:38,760][118531] Updated weights for policy 1, policy_version 30720 (0.0018) [2023-09-30 12:36:38,760][118532] Updated weights for policy 0, policy_version 30720 (0.0017) [2023-09-30 12:36:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 15745024. Throughput: 0: 784.6, 1: 784.4. Samples: 3933136. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:36:41,947][117662] Avg episode reward: [(0, '14.680'), (1, '2.840')] [2023-09-30 12:36:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 15777792. Throughput: 0: 779.7, 1: 779.3. Samples: 3942400. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:36:46,948][117662] Avg episode reward: [(0, '14.670'), (1, '2.820')] [2023-09-30 12:36:51,943][118531] Updated weights for policy 1, policy_version 30880 (0.0017) [2023-09-30 12:36:51,943][118532] Updated weights for policy 0, policy_version 30880 (0.0019) [2023-09-30 12:36:51,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 15810560. Throughput: 0: 785.6, 1: 785.8. Samples: 3951904. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:36:51,948][117662] Avg episode reward: [(0, '14.890'), (1, '2.880')] [2023-09-30 12:36:56,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15835136. Throughput: 0: 783.2, 1: 785.1. Samples: 3956661. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:36:56,948][117662] Avg episode reward: [(0, '14.800'), (1, '2.890')] [2023-09-30 12:37:01,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15867904. Throughput: 0: 782.0, 1: 782.0. Samples: 3965730. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:37:01,947][117662] Avg episode reward: [(0, '14.590'), (1, '2.910')] [2023-09-30 12:37:05,222][118531] Updated weights for policy 1, policy_version 31040 (0.0016) [2023-09-30 12:37:05,222][118532] Updated weights for policy 0, policy_version 31040 (0.0017) [2023-09-30 12:37:06,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 15900672. Throughput: 0: 778.3, 1: 779.9. Samples: 3975154. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-30 12:37:06,948][117662] Avg episode reward: [(0, '14.690'), (1, '2.910')] [2023-09-30 12:37:11,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 15933440. Throughput: 0: 777.9, 1: 779.0. Samples: 3979692. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:37:11,948][117662] Avg episode reward: [(0, '14.760'), (1, '2.870')] [2023-09-30 12:37:16,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15958016. Throughput: 0: 776.4, 1: 776.8. Samples: 3989105. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:37:16,947][117662] Avg episode reward: [(0, '14.820'), (1, '2.880')] [2023-09-30 12:37:18,343][118532] Updated weights for policy 0, policy_version 31200 (0.0016) [2023-09-30 12:37:18,344][118531] Updated weights for policy 1, policy_version 31200 (0.0016) [2023-09-30 12:37:21,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 15990784. Throughput: 0: 776.9, 1: 776.6. Samples: 3998212. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:37:21,948][117662] Avg episode reward: [(0, '14.220'), (1, '2.850')] [2023-09-30 12:37:26,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 16023552. Throughput: 0: 776.1, 1: 775.8. Samples: 4002971. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-30 12:37:26,948][117662] Avg episode reward: [(0, '13.990'), (1, '2.870')] [2023-09-30 12:37:31,598][118532] Updated weights for policy 0, policy_version 31360 (0.0017) [2023-09-30 12:37:31,598][118531] Updated weights for policy 1, policy_version 31360 (0.0017) [2023-09-30 12:37:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16056320. Throughput: 0: 773.8, 1: 773.9. Samples: 4012046. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:37:31,947][117662] Avg episode reward: [(0, '13.590'), (1, '2.830')] [2023-09-30 12:37:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16089088. Throughput: 0: 779.5, 1: 779.2. Samples: 4022047. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:37:36,948][117662] Avg episode reward: [(0, '13.170'), (1, '2.820')] [2023-09-30 12:37:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16121856. Throughput: 0: 777.2, 1: 775.7. Samples: 4026544. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:37:41,948][117662] Avg episode reward: [(0, '13.040'), (1, '2.810')] [2023-09-30 12:37:44,539][118532] Updated weights for policy 0, policy_version 31520 (0.0016) [2023-09-30 12:37:44,539][118531] Updated weights for policy 1, policy_version 31520 (0.0018) [2023-09-30 12:37:46,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16146432. Throughput: 0: 781.2, 1: 781.2. Samples: 4036037. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-30 12:37:46,947][117662] Avg episode reward: [(0, '12.880'), (1, '2.830')] [2023-09-30 12:37:51,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 16179200. Throughput: 0: 778.1, 1: 777.9. Samples: 4045172. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:37:51,947][117662] Avg episode reward: [(0, '12.690'), (1, '2.830')] [2023-09-30 12:37:56,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16211968. Throughput: 0: 782.2, 1: 780.6. Samples: 4050018. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:37:56,948][117662] Avg episode reward: [(0, '12.720'), (1, '2.870')] [2023-09-30 12:37:57,651][118532] Updated weights for policy 0, policy_version 31680 (0.0017) [2023-09-30 12:37:57,652][118531] Updated weights for policy 1, policy_version 31680 (0.0019) [2023-09-30 12:38:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16244736. Throughput: 0: 779.5, 1: 779.0. Samples: 4059236. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:38:01,947][117662] Avg episode reward: [(0, '12.600'), (1, '2.810')] [2023-09-30 12:38:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16277504. Throughput: 0: 785.2, 1: 785.5. Samples: 4068893. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:38:06,948][117662] Avg episode reward: [(0, '12.660'), (1, '2.860')] [2023-09-30 12:38:10,632][118532] Updated weights for policy 0, policy_version 31840 (0.0017) [2023-09-30 12:38:10,632][118531] Updated weights for policy 1, policy_version 31840 (0.0018) [2023-09-30 12:38:11,947][117662] Fps is (10 sec: 6143.9, 60 sec: 6212.3, 300 sec: 6234.3). Total num frames: 16306176. Throughput: 0: 783.1, 1: 783.6. Samples: 4073475. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:38:11,948][117662] Avg episode reward: [(0, '12.620'), (1, '2.890')] [2023-09-30 12:38:16,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 16334848. Throughput: 0: 788.8, 1: 788.7. Samples: 4083030. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:38:16,947][117662] Avg episode reward: [(0, '12.520'), (1, '2.800')] [2023-09-30 12:38:16,956][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000031904_8167424.pth... [2023-09-30 12:38:16,956][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000031904_8167424.pth... [2023-09-30 12:38:16,991][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000028992_7421952.pth [2023-09-30 12:38:16,993][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000028992_7421952.pth [2023-09-30 12:38:21,947][117662] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 16367616. Throughput: 0: 781.3, 1: 780.8. Samples: 4092341. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:38:21,947][117662] Avg episode reward: [(0, '11.580'), (1, '2.750')] [2023-09-30 12:38:23,647][118532] Updated weights for policy 0, policy_version 32000 (0.0017) [2023-09-30 12:38:23,647][118531] Updated weights for policy 1, policy_version 32000 (0.0017) [2023-09-30 12:38:26,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6234.3). Total num frames: 16400384. Throughput: 0: 786.1, 1: 785.4. Samples: 4097261. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:38:26,948][117662] Avg episode reward: [(0, '11.550'), (1, '2.710')] [2023-09-30 12:38:31,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16433152. Throughput: 0: 782.7, 1: 782.4. Samples: 4106468. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:38:31,948][117662] Avg episode reward: [(0, '11.810'), (1, '2.720')] [2023-09-30 12:38:36,883][118532] Updated weights for policy 0, policy_version 32160 (0.0018) [2023-09-30 12:38:36,883][118531] Updated weights for policy 1, policy_version 32160 (0.0017) [2023-09-30 12:38:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16465920. Throughput: 0: 784.5, 1: 784.0. Samples: 4115755. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:38:36,948][117662] Avg episode reward: [(0, '11.590'), (1, '2.660')] [2023-09-30 12:38:41,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16490496. Throughput: 0: 783.1, 1: 784.3. Samples: 4120548. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:38:41,947][117662] Avg episode reward: [(0, '11.590'), (1, '2.640')] [2023-09-30 12:38:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 16523264. Throughput: 0: 786.0, 1: 786.2. Samples: 4129981. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:38:46,948][117662] Avg episode reward: [(0, '10.910'), (1, '2.710')] [2023-09-30 12:38:49,958][118531] Updated weights for policy 1, policy_version 32320 (0.0017) [2023-09-30 12:38:49,960][118532] Updated weights for policy 0, policy_version 32320 (0.0018) [2023-09-30 12:38:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 16556032. Throughput: 0: 779.4, 1: 778.7. Samples: 4139010. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:38:51,947][117662] Avg episode reward: [(0, '10.980'), (1, '2.730')] [2023-09-30 12:38:56,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16588800. Throughput: 0: 779.8, 1: 779.7. Samples: 4143654. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:38:56,948][117662] Avg episode reward: [(0, '10.680'), (1, '2.740')] [2023-09-30 12:39:01,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16621568. Throughput: 0: 779.8, 1: 781.3. Samples: 4153277. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:01,948][117662] Avg episode reward: [(0, '10.570'), (1, '2.790')] [2023-09-30 12:39:03,149][118532] Updated weights for policy 0, policy_version 32480 (0.0018) [2023-09-30 12:39:03,150][118531] Updated weights for policy 1, policy_version 32480 (0.0018) [2023-09-30 12:39:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16646144. Throughput: 0: 780.2, 1: 781.3. Samples: 4162610. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:06,948][117662] Avg episode reward: [(0, '11.020'), (1, '2.750')] [2023-09-30 12:39:11,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6212.3, 300 sec: 6220.4). Total num frames: 16678912. Throughput: 0: 780.0, 1: 781.1. Samples: 4167511. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:11,947][117662] Avg episode reward: [(0, '11.120'), (1, '2.770')] [2023-09-30 12:39:16,304][118531] Updated weights for policy 1, policy_version 32640 (0.0017) [2023-09-30 12:39:16,304][118532] Updated weights for policy 0, policy_version 32640 (0.0017) [2023-09-30 12:39:16,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16711680. Throughput: 0: 776.4, 1: 776.6. Samples: 4176356. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:16,947][117662] Avg episode reward: [(0, '10.830'), (1, '2.770')] [2023-09-30 12:39:21,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16744448. Throughput: 0: 780.4, 1: 780.5. Samples: 4185993. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:21,948][117662] Avg episode reward: [(0, '11.170'), (1, '2.770')] [2023-09-30 12:39:26,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 16777216. Throughput: 0: 775.0, 1: 774.7. Samples: 4190284. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:26,948][117662] Avg episode reward: [(0, '11.570'), (1, '2.780')] [2023-09-30 12:39:29,475][118531] Updated weights for policy 1, policy_version 32800 (0.0018) [2023-09-30 12:39:29,475][118532] Updated weights for policy 0, policy_version 32800 (0.0018) [2023-09-30 12:39:31,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16801792. Throughput: 0: 775.6, 1: 778.5. Samples: 4199916. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:31,947][117662] Avg episode reward: [(0, '11.540'), (1, '2.840')] [2023-09-30 12:39:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16834560. Throughput: 0: 773.7, 1: 773.7. Samples: 4208640. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:36,948][117662] Avg episode reward: [(0, '11.270'), (1, '2.870')] [2023-09-30 12:39:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 16867328. Throughput: 0: 771.7, 1: 772.4. Samples: 4213137. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:41,948][117662] Avg episode reward: [(0, '11.260'), (1, '2.820')] [2023-09-30 12:39:43,153][118531] Updated weights for policy 1, policy_version 32960 (0.0014) [2023-09-30 12:39:43,153][118532] Updated weights for policy 0, policy_version 32960 (0.0016) [2023-09-30 12:39:46,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16891904. Throughput: 0: 768.7, 1: 769.0. Samples: 4222471. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:46,948][117662] Avg episode reward: [(0, '11.220'), (1, '2.850')] [2023-09-30 12:39:51,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16924672. Throughput: 0: 762.5, 1: 761.9. Samples: 4231208. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:51,948][117662] Avg episode reward: [(0, '11.250'), (1, '2.850')] [2023-09-30 12:39:56,524][118531] Updated weights for policy 1, policy_version 33120 (0.0017) [2023-09-30 12:39:56,524][118532] Updated weights for policy 0, policy_version 33120 (0.0016) [2023-09-30 12:39:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16957440. Throughput: 0: 762.1, 1: 761.8. Samples: 4236088. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:39:56,948][117662] Avg episode reward: [(0, '10.390'), (1, '2.830')] [2023-09-30 12:40:01,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 16990208. Throughput: 0: 768.4, 1: 768.2. Samples: 4245504. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:40:01,948][117662] Avg episode reward: [(0, '10.250'), (1, '2.780')] [2023-09-30 12:40:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17014784. Throughput: 0: 764.2, 1: 765.1. Samples: 4254812. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:40:06,948][117662] Avg episode reward: [(0, '10.540'), (1, '2.820')] [2023-09-30 12:40:09,706][118532] Updated weights for policy 0, policy_version 33280 (0.0017) [2023-09-30 12:40:09,706][118531] Updated weights for policy 1, policy_version 33280 (0.0018) [2023-09-30 12:40:11,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17047552. Throughput: 0: 769.5, 1: 769.5. Samples: 4259542. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:40:11,948][117662] Avg episode reward: [(0, '11.240'), (1, '2.800')] [2023-09-30 12:40:16,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17080320. Throughput: 0: 762.7, 1: 759.3. Samples: 4268406. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-30 12:40:16,947][117662] Avg episode reward: [(0, '11.090'), (1, '2.780')] [2023-09-30 12:40:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000033360_8540160.pth... [2023-09-30 12:40:16,958][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000033360_8540160.pth... [2023-09-30 12:40:16,995][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000030448_7794688.pth [2023-09-30 12:40:16,995][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000030448_7794688.pth [2023-09-30 12:40:21,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17113088. Throughput: 0: 767.6, 1: 769.4. Samples: 4277803. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:40:21,947][117662] Avg episode reward: [(0, '11.130'), (1, '2.760')] [2023-09-30 12:40:23,145][118531] Updated weights for policy 1, policy_version 33440 (0.0017) [2023-09-30 12:40:23,145][118532] Updated weights for policy 0, policy_version 33440 (0.0016) [2023-09-30 12:40:26,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 6192.6). Total num frames: 17137664. Throughput: 0: 769.0, 1: 769.0. Samples: 4282344. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:40:26,948][117662] Avg episode reward: [(0, '11.380'), (1, '2.820')] [2023-09-30 12:40:31,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17170432. Throughput: 0: 767.7, 1: 766.4. Samples: 4291506. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:40:31,947][117662] Avg episode reward: [(0, '11.280'), (1, '2.830')] [2023-09-30 12:40:36,529][118532] Updated weights for policy 0, policy_version 33600 (0.0018) [2023-09-30 12:40:36,529][118531] Updated weights for policy 1, policy_version 33600 (0.0018) [2023-09-30 12:40:36,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17203200. Throughput: 0: 773.3, 1: 773.1. Samples: 4300798. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:40:36,947][117662] Avg episode reward: [(0, '11.520'), (1, '2.800')] [2023-09-30 12:40:41,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17235968. Throughput: 0: 764.7, 1: 764.6. Samples: 4304907. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:40:41,947][117662] Avg episode reward: [(0, '12.080'), (1, '2.800')] [2023-09-30 12:40:46,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17260544. Throughput: 0: 769.1, 1: 770.0. Samples: 4314760. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:40:46,948][117662] Avg episode reward: [(0, '12.200'), (1, '2.780')] [2023-09-30 12:40:49,794][118532] Updated weights for policy 0, policy_version 33760 (0.0017) [2023-09-30 12:40:49,794][118531] Updated weights for policy 1, policy_version 33760 (0.0016) [2023-09-30 12:40:51,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17293312. Throughput: 0: 765.0, 1: 764.1. Samples: 4323620. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:40:51,947][117662] Avg episode reward: [(0, '12.090'), (1, '2.810')] [2023-09-30 12:40:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17326080. Throughput: 0: 763.8, 1: 763.7. Samples: 4328281. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:40:56,948][117662] Avg episode reward: [(0, '12.500'), (1, '2.850')] [2023-09-30 12:41:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17358848. Throughput: 0: 769.6, 1: 769.5. Samples: 4337664. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:41:01,947][117662] Avg episode reward: [(0, '13.100'), (1, '2.840')] [2023-09-30 12:41:03,174][118531] Updated weights for policy 1, policy_version 33920 (0.0014) [2023-09-30 12:41:03,175][118532] Updated weights for policy 0, policy_version 33920 (0.0015) [2023-09-30 12:41:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17383424. Throughput: 0: 768.8, 1: 767.4. Samples: 4346932. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:06,948][117662] Avg episode reward: [(0, '13.050'), (1, '2.830')] [2023-09-30 12:41:11,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17416192. Throughput: 0: 773.3, 1: 772.6. Samples: 4351909. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:11,948][117662] Avg episode reward: [(0, '13.100'), (1, '2.830')] [2023-09-30 12:41:16,028][118532] Updated weights for policy 0, policy_version 34080 (0.0015) [2023-09-30 12:41:16,028][118531] Updated weights for policy 1, policy_version 34080 (0.0019) [2023-09-30 12:41:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17448960. Throughput: 0: 776.1, 1: 774.8. Samples: 4361295. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:16,948][117662] Avg episode reward: [(0, '13.310'), (1, '2.870')] [2023-09-30 12:41:21,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 17481728. Throughput: 0: 773.7, 1: 773.8. Samples: 4370435. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:21,947][117662] Avg episode reward: [(0, '13.250'), (1, '2.880')] [2023-09-30 12:41:26,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17514496. Throughput: 0: 780.9, 1: 780.9. Samples: 4375186. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:26,947][117662] Avg episode reward: [(0, '13.060'), (1, '2.870')] [2023-09-30 12:41:29,147][118532] Updated weights for policy 0, policy_version 34240 (0.0018) [2023-09-30 12:41:29,147][118531] Updated weights for policy 1, policy_version 34240 (0.0016) [2023-09-30 12:41:31,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17547264. Throughput: 0: 778.3, 1: 777.4. Samples: 4384768. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:31,947][117662] Avg episode reward: [(0, '13.270'), (1, '2.810')] [2023-09-30 12:41:36,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17571840. Throughput: 0: 781.7, 1: 782.0. Samples: 4393986. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:36,948][117662] Avg episode reward: [(0, '13.430'), (1, '2.870')] [2023-09-30 12:41:41,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 17604608. Throughput: 0: 784.5, 1: 785.2. Samples: 4398917. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:41,947][117662] Avg episode reward: [(0, '13.540'), (1, '2.860')] [2023-09-30 12:41:42,241][118532] Updated weights for policy 0, policy_version 34400 (0.0018) [2023-09-30 12:41:42,241][118531] Updated weights for policy 1, policy_version 34400 (0.0019) [2023-09-30 12:41:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6192.6). Total num frames: 17637376. Throughput: 0: 782.3, 1: 782.9. Samples: 4408095. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:41:46,947][117662] Avg episode reward: [(0, '13.690'), (1, '2.840')] [2023-09-30 12:41:51,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17670144. Throughput: 0: 784.6, 1: 784.4. Samples: 4417536. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:41:51,947][117662] Avg episode reward: [(0, '13.620'), (1, '2.860')] [2023-09-30 12:41:55,290][118532] Updated weights for policy 0, policy_version 34560 (0.0018) [2023-09-30 12:41:55,290][118531] Updated weights for policy 1, policy_version 34560 (0.0017) [2023-09-30 12:41:56,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17702912. Throughput: 0: 780.4, 1: 781.1. Samples: 4422178. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:41:56,948][117662] Avg episode reward: [(0, '13.610'), (1, '2.810')] [2023-09-30 12:42:01,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17735680. Throughput: 0: 783.8, 1: 784.6. Samples: 4431872. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:42:01,948][117662] Avg episode reward: [(0, '13.780'), (1, '2.840')] [2023-09-30 12:42:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6220.4). Total num frames: 17768448. Throughput: 0: 789.3, 1: 789.2. Samples: 4441470. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:42:06,948][117662] Avg episode reward: [(0, '13.890'), (1, '2.790')] [2023-09-30 12:42:08,164][118531] Updated weights for policy 1, policy_version 34720 (0.0015) [2023-09-30 12:42:08,165][118532] Updated weights for policy 0, policy_version 34720 (0.0016) [2023-09-30 12:42:11,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17793024. Throughput: 0: 789.2, 1: 789.1. Samples: 4446208. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-30 12:42:11,948][117662] Avg episode reward: [(0, '14.600'), (1, '2.740')] [2023-09-30 12:42:16,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17825792. Throughput: 0: 785.9, 1: 785.5. Samples: 4455482. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:42:16,947][117662] Avg episode reward: [(0, '14.860'), (1, '2.760')] [2023-09-30 12:42:16,955][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000034816_8912896.pth... [2023-09-30 12:42:16,955][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000034816_8912896.pth... [2023-09-30 12:42:16,988][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000031904_8167424.pth [2023-09-30 12:42:16,990][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000031904_8167424.pth [2023-09-30 12:42:21,305][118532] Updated weights for policy 0, policy_version 34880 (0.0018) [2023-09-30 12:42:21,313][118531] Updated weights for policy 1, policy_version 34880 (0.0017) [2023-09-30 12:42:21,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17858560. Throughput: 0: 785.1, 1: 785.2. Samples: 4464647. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:42:21,948][117662] Avg episode reward: [(0, '14.610'), (1, '2.750')] [2023-09-30 12:42:26,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17891328. Throughput: 0: 782.8, 1: 781.4. Samples: 4469310. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:42:26,948][117662] Avg episode reward: [(0, '14.170'), (1, '2.740')] [2023-09-30 12:42:31,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17924096. Throughput: 0: 786.9, 1: 787.3. Samples: 4478932. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:42:31,948][117662] Avg episode reward: [(0, '14.380'), (1, '2.730')] [2023-09-30 12:42:34,576][118532] Updated weights for policy 0, policy_version 35040 (0.0017) [2023-09-30 12:42:34,576][118531] Updated weights for policy 1, policy_version 35040 (0.0016) [2023-09-30 12:42:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 17948672. Throughput: 0: 783.0, 1: 783.2. Samples: 4488016. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:42:36,948][117662] Avg episode reward: [(0, '14.380'), (1, '2.700')] [2023-09-30 12:42:41,947][117662] Fps is (10 sec: 5734.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 17981440. Throughput: 0: 786.0, 1: 785.8. Samples: 4492906. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:42:41,947][117662] Avg episode reward: [(0, '13.880'), (1, '2.740')] [2023-09-30 12:42:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18014208. Throughput: 0: 778.0, 1: 778.0. Samples: 4501894. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:42:46,948][117662] Avg episode reward: [(0, '13.470'), (1, '2.740')] [2023-09-30 12:42:47,668][118531] Updated weights for policy 1, policy_version 35200 (0.0014) [2023-09-30 12:42:47,669][118532] Updated weights for policy 0, policy_version 35200 (0.0019) [2023-09-30 12:42:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18046976. Throughput: 0: 778.0, 1: 778.0. Samples: 4511489. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:42:51,947][117662] Avg episode reward: [(0, '12.250'), (1, '2.700')] [2023-09-30 12:42:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18079744. Throughput: 0: 774.4, 1: 774.8. Samples: 4515923. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:42:56,948][117662] Avg episode reward: [(0, '11.900'), (1, '2.700')] [2023-09-30 12:43:00,889][118532] Updated weights for policy 0, policy_version 35360 (0.0018) [2023-09-30 12:43:00,889][118531] Updated weights for policy 1, policy_version 35360 (0.0017) [2023-09-30 12:43:01,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 18104320. Throughput: 0: 776.7, 1: 777.3. Samples: 4525414. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:43:01,947][117662] Avg episode reward: [(0, '11.630'), (1, '2.670')] [2023-09-30 12:43:06,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6206.5). Total num frames: 18137088. Throughput: 0: 776.8, 1: 776.8. Samples: 4534556. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:43:06,947][117662] Avg episode reward: [(0, '11.550'), (1, '2.690')] [2023-09-30 12:43:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 18169856. Throughput: 0: 780.0, 1: 780.3. Samples: 4539527. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:43:11,947][117662] Avg episode reward: [(0, '11.720'), (1, '2.700')] [2023-09-30 12:43:13,899][118531] Updated weights for policy 1, policy_version 35520 (0.0018) [2023-09-30 12:43:13,899][118532] Updated weights for policy 0, policy_version 35520 (0.0017) [2023-09-30 12:43:16,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18202624. Throughput: 0: 776.6, 1: 776.0. Samples: 4548800. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:43:16,948][117662] Avg episode reward: [(0, '11.840'), (1, '2.680')] [2023-09-30 12:43:21,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18235392. Throughput: 0: 780.9, 1: 782.1. Samples: 4558351. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-30 12:43:21,948][117662] Avg episode reward: [(0, '11.860'), (1, '2.750')] [2023-09-30 12:43:26,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 18259968. Throughput: 0: 778.3, 1: 778.1. Samples: 4562944. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:43:26,948][117662] Avg episode reward: [(0, '11.920'), (1, '2.700')] [2023-09-30 12:43:27,297][118532] Updated weights for policy 0, policy_version 35680 (0.0017) [2023-09-30 12:43:27,297][118531] Updated weights for policy 1, policy_version 35680 (0.0017) [2023-09-30 12:43:31,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 18292736. Throughput: 0: 778.1, 1: 778.3. Samples: 4571931. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:43:31,947][117662] Avg episode reward: [(0, '11.700'), (1, '2.770')] [2023-09-30 12:43:36,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 18325504. Throughput: 0: 776.2, 1: 776.6. Samples: 4581362. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:43:36,947][117662] Avg episode reward: [(0, '11.220'), (1, '2.790')] [2023-09-30 12:43:40,406][118532] Updated weights for policy 0, policy_version 35840 (0.0015) [2023-09-30 12:43:40,406][118531] Updated weights for policy 1, policy_version 35840 (0.0016) [2023-09-30 12:43:41,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18358272. Throughput: 0: 776.6, 1: 776.3. Samples: 4585802. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:43:41,948][117662] Avg episode reward: [(0, '11.300'), (1, '2.790')] [2023-09-30 12:43:46,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 18382848. Throughput: 0: 778.9, 1: 778.3. Samples: 4595485. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:43:46,948][117662] Avg episode reward: [(0, '11.050'), (1, '2.750')] [2023-09-30 12:43:51,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 18415616. Throughput: 0: 780.2, 1: 780.5. Samples: 4604791. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:43:51,948][117662] Avg episode reward: [(0, '11.250'), (1, '2.720')] [2023-09-30 12:43:53,423][118531] Updated weights for policy 1, policy_version 36000 (0.0016) [2023-09-30 12:43:53,423][118532] Updated weights for policy 0, policy_version 36000 (0.0018) [2023-09-30 12:43:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 18448384. Throughput: 0: 779.6, 1: 780.5. Samples: 4609735. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:43:56,948][117662] Avg episode reward: [(0, '10.980'), (1, '2.740')] [2023-09-30 12:44:01,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18481152. Throughput: 0: 781.7, 1: 782.0. Samples: 4619165. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:44:01,947][117662] Avg episode reward: [(0, '10.970'), (1, '2.730')] [2023-09-30 12:44:06,267][118532] Updated weights for policy 0, policy_version 36160 (0.0018) [2023-09-30 12:44:06,267][118531] Updated weights for policy 1, policy_version 36160 (0.0015) [2023-09-30 12:44:06,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18513920. Throughput: 0: 780.3, 1: 779.2. Samples: 4628527. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:44:06,947][117662] Avg episode reward: [(0, '11.100'), (1, '2.710')] [2023-09-30 12:44:11,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18546688. Throughput: 0: 782.0, 1: 783.0. Samples: 4633369. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:44:11,948][117662] Avg episode reward: [(0, '11.410'), (1, '2.710')] [2023-09-30 12:44:16,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 18579456. Throughput: 0: 787.7, 1: 787.5. Samples: 4642816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:44:16,947][117662] Avg episode reward: [(0, '11.770'), (1, '2.800')] [2023-09-30 12:44:16,955][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000036288_9289728.pth... [2023-09-30 12:44:16,955][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000036288_9289728.pth... [2023-09-30 12:44:16,990][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000033360_8540160.pth [2023-09-30 12:44:16,992][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000033360_8540160.pth [2023-09-30 12:44:19,366][118532] Updated weights for policy 0, policy_version 36320 (0.0019) [2023-09-30 12:44:19,367][118531] Updated weights for policy 1, policy_version 36320 (0.0021) [2023-09-30 12:44:21,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 18604032. Throughput: 0: 786.4, 1: 786.4. Samples: 4652139. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:44:21,948][117662] Avg episode reward: [(0, '12.000'), (1, '2.850')] [2023-09-30 12:44:26,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18636800. Throughput: 0: 792.8, 1: 792.7. Samples: 4657150. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:44:26,948][117662] Avg episode reward: [(0, '11.980'), (1, '2.800')] [2023-09-30 12:44:31,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18669568. Throughput: 0: 789.8, 1: 790.2. Samples: 4666585. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-30 12:44:31,947][117662] Avg episode reward: [(0, '11.890'), (1, '2.780')] [2023-09-30 12:44:32,237][118532] Updated weights for policy 0, policy_version 36480 (0.0016) [2023-09-30 12:44:32,238][118531] Updated weights for policy 1, policy_version 36480 (0.0017) [2023-09-30 12:44:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18702336. Throughput: 0: 791.2, 1: 791.1. Samples: 4675996. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:44:36,948][117662] Avg episode reward: [(0, '12.210'), (1, '2.790')] [2023-09-30 12:44:41,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 18735104. Throughput: 0: 789.7, 1: 789.1. Samples: 4680780. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:44:41,948][117662] Avg episode reward: [(0, '12.490'), (1, '2.820')] [2023-09-30 12:44:45,323][118531] Updated weights for policy 1, policy_version 36640 (0.0017) [2023-09-30 12:44:45,324][118532] Updated weights for policy 0, policy_version 36640 (0.0018) [2023-09-30 12:44:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6248.1). Total num frames: 18767872. Throughput: 0: 786.5, 1: 786.0. Samples: 4689925. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:44:46,948][117662] Avg episode reward: [(0, '12.420'), (1, '2.760')] [2023-09-30 12:44:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6248.1). Total num frames: 18800640. Throughput: 0: 790.3, 1: 788.5. Samples: 4699573. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:44:51,948][117662] Avg episode reward: [(0, '13.010'), (1, '2.780')] [2023-09-30 12:44:56,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 18825216. Throughput: 0: 788.1, 1: 787.2. Samples: 4704256. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-30 12:44:56,948][117662] Avg episode reward: [(0, '13.220'), (1, '2.820')] [2023-09-30 12:44:58,382][118531] Updated weights for policy 1, policy_version 36800 (0.0016) [2023-09-30 12:44:58,382][118532] Updated weights for policy 0, policy_version 36800 (0.0016) [2023-09-30 12:45:01,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 18857984. Throughput: 0: 787.5, 1: 787.8. Samples: 4713707. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:01,948][117662] Avg episode reward: [(0, '13.590'), (1, '2.820')] [2023-09-30 12:45:06,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 18890752. Throughput: 0: 785.6, 1: 785.5. Samples: 4722838. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:06,948][117662] Avg episode reward: [(0, '13.770'), (1, '2.790')] [2023-09-30 12:45:11,436][118532] Updated weights for policy 0, policy_version 36960 (0.0019) [2023-09-30 12:45:11,436][118531] Updated weights for policy 1, policy_version 36960 (0.0019) [2023-09-30 12:45:11,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 18923520. Throughput: 0: 783.7, 1: 783.9. Samples: 4727691. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:11,947][117662] Avg episode reward: [(0, '13.850'), (1, '2.740')] [2023-09-30 12:45:16,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 18956288. Throughput: 0: 782.7, 1: 782.7. Samples: 4737026. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:16,948][117662] Avg episode reward: [(0, '13.590'), (1, '2.780')] [2023-09-30 12:45:21,947][117662] Fps is (10 sec: 6553.3, 60 sec: 6417.0, 300 sec: 6275.9). Total num frames: 18989056. Throughput: 0: 787.5, 1: 787.8. Samples: 4746885. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:21,948][117662] Avg episode reward: [(0, '13.560'), (1, '2.760')] [2023-09-30 12:45:24,343][118531] Updated weights for policy 1, policy_version 37120 (0.0017) [2023-09-30 12:45:24,343][118532] Updated weights for policy 0, policy_version 37120 (0.0016) [2023-09-30 12:45:26,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 19021824. Throughput: 0: 784.4, 1: 784.1. Samples: 4751362. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:26,947][117662] Avg episode reward: [(0, '13.570'), (1, '2.760')] [2023-09-30 12:45:31,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19046400. Throughput: 0: 788.4, 1: 789.0. Samples: 4760905. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:31,948][117662] Avg episode reward: [(0, '13.360'), (1, '2.700')] [2023-09-30 12:45:36,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19079168. Throughput: 0: 784.1, 1: 786.3. Samples: 4770242. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:36,948][117662] Avg episode reward: [(0, '13.590'), (1, '2.760')] [2023-09-30 12:45:37,384][118531] Updated weights for policy 1, policy_version 37280 (0.0018) [2023-09-30 12:45:37,384][118532] Updated weights for policy 0, policy_version 37280 (0.0018) [2023-09-30 12:45:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19111936. Throughput: 0: 786.5, 1: 786.4. Samples: 4775039. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:41,948][117662] Avg episode reward: [(0, '13.650'), (1, '2.720')] [2023-09-30 12:45:46,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19144704. Throughput: 0: 786.2, 1: 785.6. Samples: 4784437. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:46,948][117662] Avg episode reward: [(0, '13.600'), (1, '2.720')] [2023-09-30 12:45:50,450][118531] Updated weights for policy 1, policy_version 37440 (0.0016) [2023-09-30 12:45:50,450][118532] Updated weights for policy 0, policy_version 37440 (0.0018) [2023-09-30 12:45:51,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19177472. Throughput: 0: 789.7, 1: 790.2. Samples: 4793933. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:51,948][117662] Avg episode reward: [(0, '13.540'), (1, '2.720')] [2023-09-30 12:45:56,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 19210240. Throughput: 0: 786.4, 1: 786.3. Samples: 4798464. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:45:56,948][117662] Avg episode reward: [(0, '13.600'), (1, '2.770')] [2023-09-30 12:46:01,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19234816. Throughput: 0: 787.1, 1: 788.8. Samples: 4807942. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:46:01,947][117662] Avg episode reward: [(0, '13.250'), (1, '2.810')] [2023-09-30 12:46:03,591][118531] Updated weights for policy 1, policy_version 37600 (0.0015) [2023-09-30 12:46:03,593][118532] Updated weights for policy 0, policy_version 37600 (0.0017) [2023-09-30 12:46:06,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19267584. Throughput: 0: 781.0, 1: 780.5. Samples: 4817153. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:46:06,948][117662] Avg episode reward: [(0, '13.150'), (1, '2.780')] [2023-09-30 12:46:11,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19300352. Throughput: 0: 785.5, 1: 785.4. Samples: 4822053. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:46:11,948][117662] Avg episode reward: [(0, '13.110'), (1, '2.780')] [2023-09-30 12:46:16,740][118531] Updated weights for policy 1, policy_version 37760 (0.0015) [2023-09-30 12:46:16,741][118532] Updated weights for policy 0, policy_version 37760 (0.0017) [2023-09-30 12:46:16,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 19333120. Throughput: 0: 781.8, 1: 781.2. Samples: 4831238. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:46:16,947][117662] Avg episode reward: [(0, '13.020'), (1, '2.790')] [2023-09-30 12:46:16,958][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000037760_9666560.pth... [2023-09-30 12:46:16,958][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000037760_9666560.pth... [2023-09-30 12:46:16,994][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000034816_8912896.pth [2023-09-30 12:46:16,996][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000034816_8912896.pth [2023-09-30 12:46:21,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19357696. Throughput: 0: 777.5, 1: 777.4. Samples: 4840213. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:46:21,948][117662] Avg episode reward: [(0, '12.560'), (1, '2.800')] [2023-09-30 12:46:26,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19390464. Throughput: 0: 778.6, 1: 778.7. Samples: 4845117. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:46:26,948][117662] Avg episode reward: [(0, '12.560'), (1, '2.790')] [2023-09-30 12:46:30,327][118531] Updated weights for policy 1, policy_version 37920 (0.0017) [2023-09-30 12:46:30,327][118532] Updated weights for policy 0, policy_version 37920 (0.0018) [2023-09-30 12:46:31,947][117662] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 19423232. Throughput: 0: 770.2, 1: 770.5. Samples: 4853765. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:46:31,947][117662] Avg episode reward: [(0, '12.800'), (1, '2.780')] [2023-09-30 12:46:36,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19447808. Throughput: 0: 768.7, 1: 768.2. Samples: 4863090. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-30 12:46:36,948][117662] Avg episode reward: [(0, '12.640'), (1, '2.780')] [2023-09-30 12:46:41,947][117662] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19480576. Throughput: 0: 771.2, 1: 771.0. Samples: 4867860. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:46:41,948][117662] Avg episode reward: [(0, '12.700'), (1, '2.820')] [2023-09-30 12:46:43,595][118531] Updated weights for policy 1, policy_version 38080 (0.0017) [2023-09-30 12:46:43,595][118532] Updated weights for policy 0, policy_version 38080 (0.0018) [2023-09-30 12:46:46,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19513344. Throughput: 0: 768.4, 1: 767.6. Samples: 4877061. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:46:46,947][117662] Avg episode reward: [(0, '12.670'), (1, '2.820')] [2023-09-30 12:46:51,947][117662] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19546112. Throughput: 0: 770.8, 1: 770.8. Samples: 4886528. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:46:51,948][117662] Avg episode reward: [(0, '12.550'), (1, '2.740')] [2023-09-30 12:46:56,911][118532] Updated weights for policy 0, policy_version 38240 (0.0017) [2023-09-30 12:46:56,911][118531] Updated weights for policy 1, policy_version 38240 (0.0018) [2023-09-30 12:46:56,947][117662] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19578880. Throughput: 0: 765.2, 1: 766.1. Samples: 4890961. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:46:56,948][117662] Avg episode reward: [(0, '12.410'), (1, '2.690')] [2023-09-30 12:47:01,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 19603456. Throughput: 0: 767.5, 1: 767.5. Samples: 4900316. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-30 12:47:01,948][117662] Avg episode reward: [(0, '12.560'), (1, '2.670')] [2023-09-30 12:47:06,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19636224. Throughput: 0: 771.2, 1: 770.9. Samples: 4909610. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:06,947][117662] Avg episode reward: [(0, '12.200'), (1, '2.670')] [2023-09-30 12:47:09,891][118531] Updated weights for policy 1, policy_version 38400 (0.0017) [2023-09-30 12:47:09,891][118532] Updated weights for policy 0, policy_version 38400 (0.0018) [2023-09-30 12:47:11,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19668992. Throughput: 0: 771.8, 1: 770.7. Samples: 4914527. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:11,948][117662] Avg episode reward: [(0, '11.990'), (1, '2.650')] [2023-09-30 12:47:16,947][117662] Fps is (10 sec: 6553.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19701760. Throughput: 0: 775.2, 1: 775.5. Samples: 4923546. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:16,948][117662] Avg episode reward: [(0, '11.340'), (1, '2.690')] [2023-09-30 12:47:21,947][117662] Fps is (10 sec: 6553.9, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 19734528. Throughput: 0: 779.2, 1: 779.0. Samples: 4933210. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:21,947][117662] Avg episode reward: [(0, '11.130'), (1, '2.750')] [2023-09-30 12:47:23,129][118532] Updated weights for policy 0, policy_version 38560 (0.0018) [2023-09-30 12:47:23,129][118531] Updated weights for policy 1, policy_version 38560 (0.0016) [2023-09-30 12:47:26,947][117662] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 19759104. Throughput: 0: 776.2, 1: 776.4. Samples: 4937728. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:26,948][117662] Avg episode reward: [(0, '11.060'), (1, '2.850')] [2023-09-30 12:47:31,947][117662] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19791872. Throughput: 0: 776.4, 1: 775.0. Samples: 4946876. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:31,948][117662] Avg episode reward: [(0, '10.850'), (1, '2.870')] [2023-09-30 12:47:36,378][118531] Updated weights for policy 1, policy_version 38720 (0.0016) [2023-09-30 12:47:36,378][118532] Updated weights for policy 0, policy_version 38720 (0.0016) [2023-09-30 12:47:36,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19824640. Throughput: 0: 773.7, 1: 773.7. Samples: 4956160. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:36,948][117662] Avg episode reward: [(0, '10.810'), (1, '2.870')] [2023-09-30 12:47:41,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19857408. Throughput: 0: 777.1, 1: 776.2. Samples: 4960858. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:41,948][117662] Avg episode reward: [(0, '10.650'), (1, '2.890')] [2023-09-30 12:47:46,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6212.2, 300 sec: 6234.2). Total num frames: 19886080. Throughput: 0: 776.6, 1: 776.9. Samples: 4970220. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:46,948][117662] Avg episode reward: [(0, '10.740'), (1, '2.850')] [2023-09-30 12:47:49,666][118532] Updated weights for policy 0, policy_version 38880 (0.0018) [2023-09-30 12:47:49,667][118531] Updated weights for policy 1, policy_version 38880 (0.0018) [2023-09-30 12:47:51,947][117662] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 19914752. Throughput: 0: 774.6, 1: 774.7. Samples: 4979328. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:51,947][117662] Avg episode reward: [(0, '10.360'), (1, '2.800')] [2023-09-30 12:47:56,947][117662] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19947520. Throughput: 0: 773.8, 1: 774.6. Samples: 4984202. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:47:56,948][117662] Avg episode reward: [(0, '10.580'), (1, '2.780')] [2023-09-30 12:48:01,947][117662] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 19980288. Throughput: 0: 776.8, 1: 776.9. Samples: 4993462. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-30 12:48:01,947][117662] Avg episode reward: [(0, '10.550'), (1, '2.810')] [2023-09-30 12:48:02,779][118532] Updated weights for policy 0, policy_version 39040 (0.0017) [2023-09-30 12:48:02,779][118531] Updated weights for policy 1, policy_version 39040 (0.0018) [2023-09-30 12:48:06,612][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000039088_10006528.pth... [2023-09-30 12:48:06,612][118570] Stopping RolloutWorker_w4... [2023-09-30 12:48:06,612][118566] Stopping RolloutWorker_w0... [2023-09-30 12:48:06,612][118569] Stopping RolloutWorker_w3... [2023-09-30 12:48:06,612][118573] Stopping RolloutWorker_w7... [2023-09-30 12:48:06,612][118572] Stopping RolloutWorker_w6... [2023-09-30 12:48:06,612][118571] Stopping RolloutWorker_w5... [2023-09-30 12:48:06,612][118567] Stopping RolloutWorker_w2... [2023-09-30 12:48:06,612][118534] Stopping RolloutWorker_w1... [2023-09-30 12:48:06,613][118438] Stopping Batcher_1... [2023-09-30 12:48:06,612][117662] Component RolloutWorker_w6 stopped! [2023-09-30 12:48:06,613][118570] Loop rollout_proc4_evt_loop terminating... [2023-09-30 12:48:06,613][118566] Loop rollout_proc0_evt_loop terminating... [2023-09-30 12:48:06,613][118573] Loop rollout_proc7_evt_loop terminating... [2023-09-30 12:48:06,613][118569] Loop rollout_proc3_evt_loop terminating... [2023-09-30 12:48:06,613][118572] Loop rollout_proc6_evt_loop terminating... [2023-09-30 12:48:06,613][118571] Loop rollout_proc5_evt_loop terminating... [2023-09-30 12:48:06,613][118567] Loop rollout_proc2_evt_loop terminating... [2023-09-30 12:48:06,613][118534] Loop rollout_proc1_evt_loop terminating... [2023-09-30 12:48:06,613][117662] Component RolloutWorker_w4 stopped! [2023-09-30 12:48:06,613][118438] Loop batcher_evt_loop terminating... [2023-09-30 12:48:06,614][117662] Component RolloutWorker_w5 stopped! [2023-09-30 12:48:06,615][117662] Component RolloutWorker_w0 stopped! [2023-09-30 12:48:06,615][117662] Component RolloutWorker_w3 stopped! [2023-09-30 12:48:06,616][117662] Component RolloutWorker_w7 stopped! [2023-09-30 12:48:06,616][117662] Component RolloutWorker_w2 stopped! [2023-09-30 12:48:06,616][117662] Component Batcher_0 stopped! [2023-09-30 12:48:06,617][117662] Component RolloutWorker_w1 stopped! [2023-09-30 12:48:06,617][117662] Component Batcher_1 stopped! [2023-09-30 12:48:06,612][118358] Stopping Batcher_0... [2023-09-30 12:48:06,632][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-09-30 12:48:06,633][118358] Loop batcher_evt_loop terminating... [2023-09-30 12:48:06,642][118358] Removing ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000036288_9289728.pth [2023-09-30 12:48:06,647][118358] Saving ./train_atari/atari_kangaroo/checkpoint_p0/checkpoint_000039088_10006528.pth... [2023-09-30 12:48:06,661][118438] Removing ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000036288_9289728.pth [2023-09-30 12:48:06,666][118438] Saving ./train_atari/atari_kangaroo/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-09-30 12:48:06,675][118531] Weights refcount: 2 0 [2023-09-30 12:48:06,676][118531] Stopping InferenceWorker_p1-w0... [2023-09-30 12:48:06,676][118531] Loop inference_proc1-0_evt_loop terminating... [2023-09-30 12:48:06,676][117662] Component InferenceWorker_p1-w0 stopped! [2023-09-30 12:48:06,677][118532] Weights refcount: 2 0 [2023-09-30 12:48:06,679][118532] Stopping InferenceWorker_p0-w0... [2023-09-30 12:48:06,679][118532] Loop inference_proc0-0_evt_loop terminating... [2023-09-30 12:48:06,679][117662] Component InferenceWorker_p0-w0 stopped! [2023-09-30 12:48:06,694][118358] Stopping LearnerWorker_p0... [2023-09-30 12:48:06,694][118358] Loop learner_proc0_evt_loop terminating... [2023-09-30 12:48:06,696][117662] Component LearnerWorker_p0 stopped! [2023-09-30 12:48:06,701][118438] Stopping LearnerWorker_p1... [2023-09-30 12:48:06,702][118438] Loop learner_proc1_evt_loop terminating... [2023-09-30 12:48:06,702][117662] Component LearnerWorker_p1 stopped! [2023-09-30 12:48:06,703][117662] Waiting for process learner_proc0 to stop... [2023-09-30 12:48:07,427][117662] Waiting for process learner_proc1 to stop... [2023-09-30 12:48:07,460][117662] Waiting for process inference_proc0-0 to join... [2023-09-30 12:48:07,461][117662] Waiting for process inference_proc1-0 to join... [2023-09-30 12:48:07,462][117662] Waiting for process rollout_proc0 to join... [2023-09-30 12:48:07,463][117662] Waiting for process rollout_proc1 to join... [2023-09-30 12:48:07,463][117662] Waiting for process rollout_proc2 to join... [2023-09-30 12:48:07,464][117662] Waiting for process rollout_proc3 to join... [2023-09-30 12:48:07,465][117662] Waiting for process rollout_proc4 to join... [2023-09-30 12:48:07,465][117662] Waiting for process rollout_proc5 to join... [2023-09-30 12:48:07,466][117662] Waiting for process rollout_proc6 to join... [2023-09-30 12:48:07,466][117662] Waiting for process rollout_proc7 to join... [2023-09-30 12:48:07,467][117662] Batcher 0 profile tree view: batching: 21.3443, releasing_batches: 1.9377 [2023-09-30 12:48:07,468][117662] Batcher 1 profile tree view: batching: 21.1202, releasing_batches: 1.8360 [2023-09-30 12:48:07,468][117662] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 658.4534 update_model: 38.4391 weight_update: 0.0017 one_step: 0.0013 handle_policy_step: 2320.6661 deserialize: 68.4654, stack: 16.8971, obs_to_device_normalize: 566.8195, forward: 1119.9962, send_messages: 94.2306 prepare_outputs: 308.4967 to_cpu: 155.4851 [2023-09-30 12:48:07,468][117662] InferenceWorker_p1-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 670.5431 update_model: 37.1166 weight_update: 0.0018 one_step: 0.0011 handle_policy_step: 2307.5057 deserialize: 67.8320, stack: 16.5186, obs_to_device_normalize: 563.8064, forward: 1114.7108, send_messages: 93.7305 prepare_outputs: 307.8769 to_cpu: 156.2373 [2023-09-30 12:48:07,468][117662] Learner 0 profile tree view: misc: 0.0153, prepare_batch: 32.4652 train: 458.9907 epoch_init: 0.0906, minibatch_init: 3.1926, losses_postprocess: 61.9466, kl_divergence: 5.5852, after_optimizer: 22.1253 calculate_losses: 46.1677 losses_init: 0.0887, forward_head: 14.7300, bptt_initial: 0.4308, bptt: 0.4435, tail: 10.6797, advantages_returns: 3.1559, losses: 12.9529 update: 315.6651 clip: 163.8345 [2023-09-30 12:48:07,469][117662] Learner 1 profile tree view: misc: 0.0172, prepare_batch: 32.0239 train: 457.9023 epoch_init: 0.0896, minibatch_init: 3.2748, losses_postprocess: 61.5828, kl_divergence: 5.5455, after_optimizer: 23.0573 calculate_losses: 46.1610 losses_init: 0.0801, forward_head: 14.8156, bptt_initial: 0.4224, bptt: 0.4499, tail: 10.5664, advantages_returns: 3.1149, losses: 12.9911 update: 313.9404 clip: 162.5864 [2023-09-30 12:48:07,469][117662] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3991, enqueue_policy_requests: 42.5658, env_step: 1243.0677, overhead: 28.5127, complete_rollouts: 1.0526 save_policy_outputs: 54.0519 split_output_tensors: 18.6834 [2023-09-30 12:48:07,469][117662] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3956, enqueue_policy_requests: 41.8960, env_step: 1174.1007, overhead: 27.4285, complete_rollouts: 1.0525 save_policy_outputs: 53.9552 split_output_tensors: 18.8250 [2023-09-30 12:48:07,470][117662] Loop Runner_EvtLoop terminating... [2023-09-30 12:48:07,470][117662] Runner profile tree view: main_loop: 3226.2651 [2023-09-30 12:48:07,471][117662] Collected {0: 10006528, 1: 10006528}, FPS: 6203.2