[2023-10-05 17:03:07,462][23454] Saving configuration to ./train_atari/atari_bowling/config.json... [2023-10-05 17:03:07,779][23454] Rollout worker 0 uses device cpu [2023-10-05 17:03:07,780][23454] Rollout worker 1 uses device cpu [2023-10-05 17:03:07,781][23454] Rollout worker 2 uses device cpu [2023-10-05 17:03:07,781][23454] Rollout worker 3 uses device cpu [2023-10-05 17:03:07,782][23454] Rollout worker 4 uses device cpu [2023-10-05 17:03:07,782][23454] Rollout worker 5 uses device cpu [2023-10-05 17:03:07,783][23454] Rollout worker 6 uses device cpu [2023-10-05 17:03:07,783][23454] Rollout worker 7 uses device cpu [2023-10-05 17:03:07,784][23454] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-10-05 17:03:07,829][23454] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-05 17:03:07,829][23454] InferenceWorker_p0-w0: min num requests: 1 [2023-10-05 17:03:07,833][23454] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-10-05 17:03:07,833][23454] InferenceWorker_p1-w0: min num requests: 1 [2023-10-05 17:03:07,856][23454] Starting all processes... [2023-10-05 17:03:07,857][23454] Starting process learner_proc0 [2023-10-05 17:03:09,506][23454] Starting process learner_proc1 [2023-10-05 17:03:09,510][24064] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-05 17:03:09,510][24064] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-10-05 17:03:09,529][24064] Num visible devices: 1 [2023-10-05 17:03:09,545][24064] Starting seed is not provided [2023-10-05 17:03:09,546][24064] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-05 17:03:09,546][24064] Initializing actor-critic model on device cuda:0 [2023-10-05 17:03:09,546][24064] RunningMeanStd input shape: (4, 84, 84) [2023-10-05 17:03:09,547][24064] RunningMeanStd input shape: (1,) [2023-10-05 17:03:09,558][24064] ConvEncoder: input_channels=4 [2023-10-05 17:03:09,733][24064] Conv encoder output size: 512 [2023-10-05 17:03:09,735][24064] Created Actor Critic model with architecture: [2023-10-05 17:03:09,735][24064] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=6, bias=True) ) ) [2023-10-05 17:03:10,345][24064] Using optimizer [2023-10-05 17:03:10,346][24064] No checkpoints found [2023-10-05 17:03:10,346][24064] Did not load from checkpoint, starting from scratch! [2023-10-05 17:03:10,346][24064] Initialized policy 0 weights for model version 0 [2023-10-05 17:03:10,348][24064] LearnerWorker_p0 finished initialization! [2023-10-05 17:03:10,348][24064] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-05 17:03:11,227][23454] Starting all processes... [2023-10-05 17:03:11,230][24178] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-10-05 17:03:11,231][24178] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-10-05 17:03:11,234][23454] Starting process inference_proc0-0 [2023-10-05 17:03:11,234][23454] Starting process inference_proc1-0 [2023-10-05 17:03:11,235][23454] Starting process rollout_proc0 [2023-10-05 17:03:11,235][23454] Starting process rollout_proc1 [2023-10-05 17:03:11,250][24178] Num visible devices: 1 [2023-10-05 17:03:11,235][23454] Starting process rollout_proc2 [2023-10-05 17:03:11,236][23454] Starting process rollout_proc3 [2023-10-05 17:03:11,239][23454] Starting process rollout_proc4 [2023-10-05 17:03:11,267][24178] Starting seed is not provided [2023-10-05 17:03:11,268][24178] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-10-05 17:03:11,268][24178] Initializing actor-critic model on device cuda:0 [2023-10-05 17:03:11,268][24178] RunningMeanStd input shape: (4, 84, 84) [2023-10-05 17:03:11,269][24178] RunningMeanStd input shape: (1,) [2023-10-05 17:03:11,240][23454] Starting process rollout_proc5 [2023-10-05 17:03:11,242][23454] Starting process rollout_proc6 [2023-10-05 17:03:11,245][23454] Starting process rollout_proc7 [2023-10-05 17:03:11,282][24178] ConvEncoder: input_channels=4 [2023-10-05 17:03:11,590][24178] Conv encoder output size: 512 [2023-10-05 17:03:11,592][24178] Created Actor Critic model with architecture: [2023-10-05 17:03:11,592][24178] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=6, bias=True) ) ) [2023-10-05 17:03:12,347][24178] Using optimizer [2023-10-05 17:03:12,348][24178] No checkpoints found [2023-10-05 17:03:12,349][24178] Did not load from checkpoint, starting from scratch! [2023-10-05 17:03:12,349][24178] Initialized policy 1 weights for model version 0 [2023-10-05 17:03:12,351][24178] LearnerWorker_p1 finished initialization! [2023-10-05 17:03:12,352][24178] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-10-05 17:03:13,176][24500] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-10-05 17:03:13,187][24498] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-10-05 17:03:13,230][24497] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-10-05 17:03:13,231][24499] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-10-05 17:03:13,252][24493] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-10-05 17:03:13,280][24496] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-10-05 17:03:13,296][24456] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-05 17:03:13,296][24456] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-10-05 17:03:13,306][24460] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-10-05 17:03:13,306][24460] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-10-05 17:03:13,311][24456] Num visible devices: 1 [2023-10-05 17:03:13,320][24494] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-10-05 17:03:13,321][24460] Num visible devices: 1 [2023-10-05 17:03:13,364][24459] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-10-05 17:03:13,652][23454] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan, 1: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-10-05 17:03:13,942][24456] RunningMeanStd input shape: (4, 84, 84) [2023-10-05 17:03:13,942][24456] RunningMeanStd input shape: (1,) [2023-10-05 17:03:13,954][24456] ConvEncoder: input_channels=4 [2023-10-05 17:03:13,963][24460] RunningMeanStd input shape: (4, 84, 84) [2023-10-05 17:03:13,963][24460] RunningMeanStd input shape: (1,) [2023-10-05 17:03:13,974][24460] ConvEncoder: input_channels=4 [2023-10-05 17:03:14,052][24456] Conv encoder output size: 512 [2023-10-05 17:03:14,058][23454] Inference worker 0-0 is ready! [2023-10-05 17:03:14,075][24460] Conv encoder output size: 512 [2023-10-05 17:03:14,080][23454] Inference worker 1-0 is ready! [2023-10-05 17:03:14,081][23454] All inference workers are ready! Signal rollout workers to start! [2023-10-05 17:03:14,530][24498] Decorrelating experience for 0 frames... [2023-10-05 17:03:14,532][24500] Decorrelating experience for 0 frames... [2023-10-05 17:03:14,536][24459] Decorrelating experience for 0 frames... [2023-10-05 17:03:14,558][24496] Decorrelating experience for 0 frames... [2023-10-05 17:03:14,665][24497] Decorrelating experience for 0 frames... [2023-10-05 17:03:14,667][24493] Decorrelating experience for 0 frames... [2023-10-05 17:03:14,677][24499] Decorrelating experience for 0 frames... [2023-10-05 17:03:14,686][24494] Decorrelating experience for 0 frames... [2023-10-05 17:03:18,652][23454] Fps is (10 sec: 1638.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 8192. Throughput: 0: 204.8, 1: 204.8. Samples: 2048. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:03:23,651][23454] Fps is (10 sec: 3276.9, 60 sec: 3276.9, 300 sec: 3276.9). Total num frames: 32768. Throughput: 0: 409.6, 1: 409.6. Samples: 8192. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:03:27,816][23454] Heartbeat connected on Batcher_0 [2023-10-05 17:03:27,819][23454] Heartbeat connected on LearnerWorker_p0 [2023-10-05 17:03:27,822][23454] Heartbeat connected on Batcher_1 [2023-10-05 17:03:27,825][23454] Heartbeat connected on LearnerWorker_p1 [2023-10-05 17:03:27,831][23454] Heartbeat connected on InferenceWorker_p0-w0 [2023-10-05 17:03:27,835][23454] Heartbeat connected on InferenceWorker_p1-w0 [2023-10-05 17:03:27,836][23454] Heartbeat connected on RolloutWorker_w0 [2023-10-05 17:03:27,839][23454] Heartbeat connected on RolloutWorker_w1 [2023-10-05 17:03:27,841][23454] Heartbeat connected on RolloutWorker_w2 [2023-10-05 17:03:27,844][23454] Heartbeat connected on RolloutWorker_w3 [2023-10-05 17:03:27,848][23454] Heartbeat connected on RolloutWorker_w4 [2023-10-05 17:03:27,850][23454] Heartbeat connected on RolloutWorker_w5 [2023-10-05 17:03:27,854][23454] Heartbeat connected on RolloutWorker_w6 [2023-10-05 17:03:27,856][23454] Heartbeat connected on RolloutWorker_w7 [2023-10-05 17:03:28,651][23454] Fps is (10 sec: 5734.6, 60 sec: 4369.1, 300 sec: 4369.1). Total num frames: 65536. Throughput: 0: 433.4, 1: 435.7. Samples: 13036. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:03:28,652][23454] Avg episode reward: [(0, '6.000'), (1, '6.500')] [2023-10-05 17:03:30,695][24460] Updated weights for policy 1, policy_version 160 (0.0015) [2023-10-05 17:03:30,695][24456] Updated weights for policy 0, policy_version 160 (0.0016) [2023-10-05 17:03:33,652][23454] Fps is (10 sec: 6553.5, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 98304. Throughput: 0: 569.2, 1: 569.7. Samples: 22778. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:03:33,653][23454] Avg episode reward: [(0, '6.000'), (1, '6.500')] [2023-10-05 17:03:38,652][23454] Fps is (10 sec: 6553.5, 60 sec: 5242.9, 300 sec: 5242.9). Total num frames: 131072. Throughput: 0: 655.2, 1: 655.4. Samples: 32764. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:03:38,653][23454] Avg episode reward: [(0, '6.167'), (1, '6.500')] [2023-10-05 17:03:43,273][24456] Updated weights for policy 0, policy_version 320 (0.0018) [2023-10-05 17:03:43,273][24460] Updated weights for policy 1, policy_version 320 (0.0017) [2023-10-05 17:03:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 5461.3). Total num frames: 163840. Throughput: 0: 623.3, 1: 623.9. Samples: 37416. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:03:43,652][23454] Avg episode reward: [(0, '6.500'), (1, '7.500')] [2023-10-05 17:03:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 5617.4, 300 sec: 5617.4). Total num frames: 196608. Throughput: 0: 675.1, 1: 675.7. Samples: 47280. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:03:48,652][23454] Avg episode reward: [(0, '6.500'), (1, '7.500')] [2023-10-05 17:03:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 5734.4, 300 sec: 5734.4). Total num frames: 229376. Throughput: 0: 713.2, 1: 714.6. Samples: 57111. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:03:53,652][23454] Avg episode reward: [(0, '7.083'), (1, '7.500')] [2023-10-05 17:03:53,654][24064] Saving new best policy, reward=7.083! [2023-10-05 17:03:53,654][24178] Saving new best policy, reward=7.500! [2023-10-05 17:03:55,968][24456] Updated weights for policy 0, policy_version 480 (0.0018) [2023-10-05 17:03:55,968][24460] Updated weights for policy 1, policy_version 480 (0.0020) [2023-10-05 17:03:58,651][23454] Fps is (10 sec: 6553.7, 60 sec: 5825.5, 300 sec: 5825.5). Total num frames: 262144. Throughput: 0: 684.5, 1: 684.9. Samples: 61625. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:03:58,652][23454] Avg episode reward: [(0, '7.083'), (1, '7.500')] [2023-10-05 17:04:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 5898.3, 300 sec: 5898.3). Total num frames: 294912. Throughput: 0: 773.7, 1: 773.7. Samples: 71680. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:04:03,652][23454] Avg episode reward: [(0, '7.750'), (1, '7.750')] [2023-10-05 17:04:03,654][24064] Saving new best policy, reward=7.750! [2023-10-05 17:04:03,654][24178] Saving new best policy, reward=7.750! [2023-10-05 17:04:08,634][24456] Updated weights for policy 0, policy_version 640 (0.0020) [2023-10-05 17:04:08,634][24460] Updated weights for policy 1, policy_version 640 (0.0019) [2023-10-05 17:04:08,651][23454] Fps is (10 sec: 6553.5, 60 sec: 5957.8, 300 sec: 5957.8). Total num frames: 327680. Throughput: 0: 811.3, 1: 812.4. Samples: 81261. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:08,652][23454] Avg episode reward: [(0, '7.750'), (1, '7.750')] [2023-10-05 17:04:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 6007.5). Total num frames: 360448. Throughput: 0: 811.3, 1: 810.5. Samples: 86016. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:13,652][23454] Avg episode reward: [(0, '8.050'), (1, '8.050')] [2023-10-05 17:04:13,653][24064] Saving new best policy, reward=8.050! [2023-10-05 17:04:13,653][24178] Saving new best policy, reward=8.050! [2023-10-05 17:04:18,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 5923.5). Total num frames: 385024. Throughput: 0: 813.1, 1: 813.6. Samples: 95977. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:04:18,652][23454] Avg episode reward: [(0, '8.050'), (1, '8.050')] [2023-10-05 17:04:21,199][24460] Updated weights for policy 1, policy_version 800 (0.0016) [2023-10-05 17:04:21,199][24456] Updated weights for policy 0, policy_version 800 (0.0018) [2023-10-05 17:04:23,651][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 5968.5). Total num frames: 417792. Throughput: 0: 810.2, 1: 809.9. Samples: 105669. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:04:23,652][23454] Avg episode reward: [(0, '8.292'), (1, '8.333')] [2023-10-05 17:04:23,676][24064] Saving new best policy, reward=8.292! [2023-10-05 17:04:23,696][24178] Saving new best policy, reward=8.333! [2023-10-05 17:04:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6007.5). Total num frames: 450560. Throughput: 0: 813.3, 1: 812.9. Samples: 110592. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:28,652][23454] Avg episode reward: [(0, '8.292'), (1, '8.333')] [2023-10-05 17:04:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6041.6). Total num frames: 483328. Throughput: 0: 812.9, 1: 813.5. Samples: 120469. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:33,652][23454] Avg episode reward: [(0, '8.536'), (1, '8.571')] [2023-10-05 17:04:33,672][24178] Saving new best policy, reward=8.571! [2023-10-05 17:04:33,750][24064] Saving new best policy, reward=8.536! [2023-10-05 17:04:33,754][24460] Updated weights for policy 1, policy_version 960 (0.0017) [2023-10-05 17:04:33,754][24456] Updated weights for policy 0, policy_version 960 (0.0017) [2023-10-05 17:04:38,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6071.7). Total num frames: 516096. Throughput: 0: 807.9, 1: 808.4. Samples: 129844. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:38,653][23454] Avg episode reward: [(0, '8.536'), (1, '8.571')] [2023-10-05 17:04:43,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6098.5). Total num frames: 548864. Throughput: 0: 814.0, 1: 814.1. Samples: 134892. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:04:43,652][23454] Avg episode reward: [(0, '8.688'), (1, '8.719')] [2023-10-05 17:04:43,654][24064] Saving new best policy, reward=8.688! [2023-10-05 17:04:43,654][24178] Saving new best policy, reward=8.719! [2023-10-05 17:04:46,444][24456] Updated weights for policy 0, policy_version 1120 (0.0017) [2023-10-05 17:04:46,444][24460] Updated weights for policy 1, policy_version 1120 (0.0017) [2023-10-05 17:04:48,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6122.4). Total num frames: 581632. Throughput: 0: 810.2, 1: 810.6. Samples: 144618. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:48,652][23454] Avg episode reward: [(0, '8.688'), (1, '8.719')] [2023-10-05 17:04:53,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6144.0). Total num frames: 614400. Throughput: 0: 812.5, 1: 812.0. Samples: 154364. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:53,652][23454] Avg episode reward: [(0, '8.806'), (1, '8.861')] [2023-10-05 17:04:53,653][24064] Saving new best policy, reward=8.806! [2023-10-05 17:04:53,653][24178] Saving new best policy, reward=8.861! [2023-10-05 17:04:58,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6163.5). Total num frames: 647168. Throughput: 0: 816.6, 1: 817.2. Samples: 159538. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:04:58,652][23454] Avg episode reward: [(0, '8.806'), (1, '8.861')] [2023-10-05 17:04:58,978][24460] Updated weights for policy 1, policy_version 1280 (0.0017) [2023-10-05 17:04:58,978][24456] Updated weights for policy 0, policy_version 1280 (0.0017) [2023-10-05 17:05:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6181.2). Total num frames: 679936. Throughput: 0: 812.6, 1: 812.3. Samples: 169100. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:05:03,652][23454] Avg episode reward: [(0, '8.900'), (1, '8.975')] [2023-10-05 17:05:03,657][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000001328_339968.pth... [2023-10-05 17:05:03,657][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000001328_339968.pth... [2023-10-05 17:05:03,686][24064] Saving new best policy, reward=8.900! [2023-10-05 17:05:03,694][24178] Saving new best policy, reward=8.975! [2023-10-05 17:05:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6197.4). Total num frames: 712704. Throughput: 0: 811.4, 1: 812.4. Samples: 178742. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:05:08,652][23454] Avg episode reward: [(0, '8.900'), (1, '8.975')] [2023-10-05 17:05:11,657][24456] Updated weights for policy 0, policy_version 1440 (0.0016) [2023-10-05 17:05:11,658][24460] Updated weights for policy 1, policy_version 1440 (0.0015) [2023-10-05 17:05:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6212.3). Total num frames: 745472. Throughput: 0: 810.8, 1: 811.0. Samples: 183572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:05:13,652][23454] Avg episode reward: [(0, '9.000'), (1, '9.045')] [2023-10-05 17:05:13,653][24064] Saving new best policy, reward=9.000! [2023-10-05 17:05:13,653][24178] Saving new best policy, reward=9.045! [2023-10-05 17:05:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6225.9). Total num frames: 778240. Throughput: 0: 809.0, 1: 808.3. Samples: 193246. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:05:18,652][23454] Avg episode reward: [(0, '9.000'), (1, '9.045')] [2023-10-05 17:05:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6238.5). Total num frames: 811008. Throughput: 0: 811.3, 1: 810.4. Samples: 202819. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:05:23,652][23454] Avg episode reward: [(0, '9.043'), (1, '9.087')] [2023-10-05 17:05:23,654][24064] Saving new best policy, reward=9.043! [2023-10-05 17:05:23,654][24178] Saving new best policy, reward=9.087! [2023-10-05 17:05:24,298][24460] Updated weights for policy 1, policy_version 1600 (0.0015) [2023-10-05 17:05:24,298][24456] Updated weights for policy 0, policy_version 1600 (0.0019) [2023-10-05 17:05:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6250.2). Total num frames: 843776. Throughput: 0: 810.7, 1: 810.9. Samples: 207866. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:05:28,652][23454] Avg episode reward: [(0, '9.083'), (1, '9.125')] [2023-10-05 17:05:28,654][24064] Saving new best policy, reward=9.083! [2023-10-05 17:05:28,654][24178] Saving new best policy, reward=9.125! [2023-10-05 17:05:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6261.0). Total num frames: 876544. Throughput: 0: 811.3, 1: 811.4. Samples: 217638. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:05:33,653][23454] Avg episode reward: [(0, '9.080'), (1, '9.125')] [2023-10-05 17:05:36,901][24460] Updated weights for policy 1, policy_version 1760 (0.0017) [2023-10-05 17:05:36,901][24456] Updated weights for policy 0, policy_version 1760 (0.0017) [2023-10-05 17:05:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6271.1). Total num frames: 909312. Throughput: 0: 811.0, 1: 810.4. Samples: 227328. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:05:38,652][23454] Avg episode reward: [(0, '9.115'), (1, '9.173')] [2023-10-05 17:05:38,653][24064] Saving new best policy, reward=9.115! [2023-10-05 17:05:38,653][24178] Saving new best policy, reward=9.173! [2023-10-05 17:05:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6280.5). Total num frames: 942080. Throughput: 0: 805.0, 1: 804.9. Samples: 231982. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:05:43,653][23454] Avg episode reward: [(0, '9.115'), (1, '9.173')] [2023-10-05 17:05:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6289.4). Total num frames: 974848. Throughput: 0: 807.7, 1: 807.9. Samples: 241800. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:05:48,652][23454] Avg episode reward: [(0, '9.179'), (1, '9.232')] [2023-10-05 17:05:48,655][24064] Saving new best policy, reward=9.179! [2023-10-05 17:05:48,655][24178] Saving new best policy, reward=9.232! [2023-10-05 17:05:49,525][24456] Updated weights for policy 0, policy_version 1920 (0.0015) [2023-10-05 17:05:49,526][24460] Updated weights for policy 1, policy_version 1920 (0.0017) [2023-10-05 17:05:53,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6297.6). Total num frames: 1007616. Throughput: 0: 811.5, 1: 810.1. Samples: 251715. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:05:53,653][23454] Avg episode reward: [(0, '9.179'), (1, '9.232')] [2023-10-05 17:05:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6305.4). Total num frames: 1040384. Throughput: 0: 806.0, 1: 806.3. Samples: 256127. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:05:58,652][23454] Avg episode reward: [(0, '9.200'), (1, '9.267')] [2023-10-05 17:05:58,653][24178] Saving new best policy, reward=9.267! [2023-10-05 17:05:58,652][24064] Saving new best policy, reward=9.200! [2023-10-05 17:06:02,273][24460] Updated weights for policy 1, policy_version 2080 (0.0016) [2023-10-05 17:06:02,273][24456] Updated weights for policy 0, policy_version 2080 (0.0017) [2023-10-05 17:06:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6312.7). Total num frames: 1073152. Throughput: 0: 811.2, 1: 810.8. Samples: 266240. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:06:03,653][23454] Avg episode reward: [(0, '9.200'), (1, '9.267')] [2023-10-05 17:06:08,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.0, 300 sec: 6272.7). Total num frames: 1097728. Throughput: 0: 810.8, 1: 810.6. Samples: 275779. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:06:08,653][23454] Avg episode reward: [(0, '9.250'), (1, '9.312')] [2023-10-05 17:06:08,666][24064] Saving new best policy, reward=9.250! [2023-10-05 17:06:08,674][24178] Saving new best policy, reward=9.312! [2023-10-05 17:06:13,651][23454] Fps is (10 sec: 5734.6, 60 sec: 6417.1, 300 sec: 6280.5). Total num frames: 1130496. Throughput: 0: 808.1, 1: 807.6. Samples: 280572. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:06:13,652][23454] Avg episode reward: [(0, '9.250'), (1, '9.312')] [2023-10-05 17:06:15,139][24460] Updated weights for policy 1, policy_version 2240 (0.0017) [2023-10-05 17:06:15,139][24456] Updated weights for policy 0, policy_version 2240 (0.0016) [2023-10-05 17:06:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6287.9). Total num frames: 1163264. Throughput: 0: 805.9, 1: 805.5. Samples: 290152. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:06:18,652][23454] Avg episode reward: [(0, '9.294'), (1, '9.353')] [2023-10-05 17:06:18,656][24064] Saving new best policy, reward=9.294! [2023-10-05 17:06:18,657][24178] Saving new best policy, reward=9.353! [2023-10-05 17:06:23,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6294.9). Total num frames: 1196032. Throughput: 0: 805.2, 1: 806.0. Samples: 299830. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:06:23,653][23454] Avg episode reward: [(0, '9.294'), (1, '9.353')] [2023-10-05 17:06:27,635][24456] Updated weights for policy 0, policy_version 2400 (0.0020) [2023-10-05 17:06:27,635][24460] Updated weights for policy 1, policy_version 2400 (0.0018) [2023-10-05 17:06:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6301.5). Total num frames: 1228800. Throughput: 0: 812.0, 1: 811.9. Samples: 305059. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:06:28,652][23454] Avg episode reward: [(0, '9.319'), (1, '9.389')] [2023-10-05 17:06:28,653][24064] Saving new best policy, reward=9.319! [2023-10-05 17:06:28,653][24178] Saving new best policy, reward=9.389! [2023-10-05 17:06:33,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6307.8). Total num frames: 1261568. Throughput: 0: 810.6, 1: 809.4. Samples: 314701. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:06:33,652][23454] Avg episode reward: [(0, '9.319'), (1, '9.389')] [2023-10-05 17:06:38,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6313.8). Total num frames: 1294336. Throughput: 0: 807.3, 1: 807.6. Samples: 324388. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:06:38,653][23454] Avg episode reward: [(0, '9.316'), (1, '9.408')] [2023-10-05 17:06:38,654][24178] Saving new best policy, reward=9.408! [2023-10-05 17:06:40,193][24460] Updated weights for policy 1, policy_version 2560 (0.0018) [2023-10-05 17:06:40,193][24456] Updated weights for policy 0, policy_version 2560 (0.0016) [2023-10-05 17:06:43,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6319.5). Total num frames: 1327104. Throughput: 0: 814.6, 1: 814.8. Samples: 329452. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:06:43,652][23454] Avg episode reward: [(0, '9.316'), (1, '9.408')] [2023-10-05 17:06:48,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6325.0). Total num frames: 1359872. Throughput: 0: 810.6, 1: 811.4. Samples: 339233. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:06:48,652][23454] Avg episode reward: [(0, '9.350'), (1, '9.438')] [2023-10-05 17:06:48,656][24064] Saving new best policy, reward=9.350! [2023-10-05 17:06:48,657][24178] Saving new best policy, reward=9.438! [2023-10-05 17:06:52,841][24460] Updated weights for policy 1, policy_version 2720 (0.0019) [2023-10-05 17:06:52,841][24456] Updated weights for policy 0, policy_version 2720 (0.0016) [2023-10-05 17:06:53,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6330.2). Total num frames: 1392640. Throughput: 0: 809.0, 1: 809.4. Samples: 348607. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:06:53,653][23454] Avg episode reward: [(0, '9.350'), (1, '9.438')] [2023-10-05 17:06:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6335.2). Total num frames: 1425408. Throughput: 0: 812.2, 1: 812.3. Samples: 353675. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:06:58,652][23454] Avg episode reward: [(0, '9.381'), (1, '9.438')] [2023-10-05 17:06:58,653][24064] Saving new best policy, reward=9.381! [2023-10-05 17:07:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6339.9). Total num frames: 1458176. Throughput: 0: 812.1, 1: 812.2. Samples: 363243. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:03,652][23454] Avg episode reward: [(0, '9.381'), (1, '9.464')] [2023-10-05 17:07:03,657][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000002848_729088.pth... [2023-10-05 17:07:03,657][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000002848_729088.pth... [2023-10-05 17:07:03,685][24178] Saving new best policy, reward=9.464! [2023-10-05 17:07:05,506][24460] Updated weights for policy 1, policy_version 2880 (0.0018) [2023-10-05 17:07:05,508][24456] Updated weights for policy 0, policy_version 2880 (0.0019) [2023-10-05 17:07:08,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6344.4). Total num frames: 1490944. Throughput: 0: 811.0, 1: 810.6. Samples: 372803. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:08,653][23454] Avg episode reward: [(0, '9.398'), (1, '9.464')] [2023-10-05 17:07:08,654][24064] Saving new best policy, reward=9.398! [2023-10-05 17:07:13,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6348.8). Total num frames: 1523712. Throughput: 0: 808.4, 1: 808.3. Samples: 377809. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:13,652][23454] Avg episode reward: [(0, '9.398'), (1, '9.477')] [2023-10-05 17:07:13,653][24178] Saving new best policy, reward=9.477! [2023-10-05 17:07:18,366][24456] Updated weights for policy 0, policy_version 3040 (0.0015) [2023-10-05 17:07:18,366][24460] Updated weights for policy 1, policy_version 3040 (0.0016) [2023-10-05 17:07:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6353.0). Total num frames: 1556480. Throughput: 0: 804.0, 1: 804.8. Samples: 387095. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:18,652][23454] Avg episode reward: [(0, '9.393'), (1, '9.477')] [2023-10-05 17:07:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6357.0). Total num frames: 1589248. Throughput: 0: 807.4, 1: 808.6. Samples: 397112. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:07:23,652][23454] Avg episode reward: [(0, '9.413'), (1, '9.500')] [2023-10-05 17:07:23,652][24064] Saving new best policy, reward=9.413! [2023-10-05 17:07:23,653][24178] Saving new best policy, reward=9.500! [2023-10-05 17:07:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6360.8). Total num frames: 1622016. Throughput: 0: 803.9, 1: 803.8. Samples: 401801. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:28,653][23454] Avg episode reward: [(0, '9.413'), (1, '9.500')] [2023-10-05 17:07:30,874][24460] Updated weights for policy 1, policy_version 3200 (0.0019) [2023-10-05 17:07:30,874][24456] Updated weights for policy 0, policy_version 3200 (0.0018) [2023-10-05 17:07:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6364.6). Total num frames: 1654784. Throughput: 0: 805.0, 1: 804.2. Samples: 411649. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:07:33,653][23454] Avg episode reward: [(0, '9.438'), (1, '9.510')] [2023-10-05 17:07:33,656][24064] Saving new best policy, reward=9.438! [2023-10-05 17:07:33,656][24178] Saving new best policy, reward=9.510! [2023-10-05 17:07:38,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6368.1). Total num frames: 1687552. Throughput: 0: 809.1, 1: 808.5. Samples: 421402. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:38,653][23454] Avg episode reward: [(0, '9.438'), (1, '9.510')] [2023-10-05 17:07:43,565][24456] Updated weights for policy 0, policy_version 3360 (0.0016) [2023-10-05 17:07:43,565][24460] Updated weights for policy 1, policy_version 3360 (0.0019) [2023-10-05 17:07:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6371.6). Total num frames: 1720320. Throughput: 0: 803.7, 1: 803.6. Samples: 426001. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:07:43,653][23454] Avg episode reward: [(0, '9.450'), (1, '9.530')] [2023-10-05 17:07:43,654][24064] Saving new best policy, reward=9.450! [2023-10-05 17:07:43,654][24178] Saving new best policy, reward=9.530! [2023-10-05 17:07:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6374.9). Total num frames: 1753088. Throughput: 0: 810.4, 1: 810.8. Samples: 436197. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:48,652][23454] Avg episode reward: [(0, '9.450'), (1, '9.530')] [2023-10-05 17:07:53,651][23454] Fps is (10 sec: 6144.1, 60 sec: 6485.4, 300 sec: 6363.4). Total num frames: 1781760. Throughput: 0: 811.8, 1: 810.6. Samples: 445811. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:53,652][23454] Avg episode reward: [(0, '9.590'), (1, '9.660')] [2023-10-05 17:07:53,654][24064] Saving new best policy, reward=9.590! [2023-10-05 17:07:53,657][24178] Saving new best policy, reward=9.660! [2023-10-05 17:07:56,224][24456] Updated weights for policy 0, policy_version 3520 (0.0017) [2023-10-05 17:07:56,224][24460] Updated weights for policy 1, policy_version 3520 (0.0016) [2023-10-05 17:07:58,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6352.4). Total num frames: 1810432. Throughput: 0: 808.5, 1: 808.2. Samples: 450560. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:07:58,652][23454] Avg episode reward: [(0, '9.590'), (1, '9.660')] [2023-10-05 17:08:03,652][23454] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 6355.9). Total num frames: 1843200. Throughput: 0: 812.9, 1: 813.3. Samples: 460274. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:08:03,652][23454] Avg episode reward: [(0, '9.710'), (1, '9.710')] [2023-10-05 17:08:03,788][24178] Saving new best policy, reward=9.710! [2023-10-05 17:08:03,793][24064] Saving new best policy, reward=9.710! [2023-10-05 17:08:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 1875968. Throughput: 0: 809.3, 1: 809.1. Samples: 469942. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:08:08,652][23454] Avg episode reward: [(0, '9.710'), (1, '9.710')] [2023-10-05 17:08:08,842][24456] Updated weights for policy 0, policy_version 3680 (0.0016) [2023-10-05 17:08:08,844][24460] Updated weights for policy 1, policy_version 3680 (0.0017) [2023-10-05 17:08:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6442.5). Total num frames: 1908736. Throughput: 0: 814.1, 1: 814.0. Samples: 475067. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:08:13,653][23454] Avg episode reward: [(0, '9.750'), (1, '9.770')] [2023-10-05 17:08:13,654][24064] Saving new best policy, reward=9.750! [2023-10-05 17:08:13,824][24178] Saving new best policy, reward=9.770! [2023-10-05 17:08:18,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 1941504. Throughput: 0: 810.5, 1: 811.8. Samples: 484654. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:08:18,652][23454] Avg episode reward: [(0, '9.750'), (1, '9.800')] [2023-10-05 17:08:18,663][24178] Saving new best policy, reward=9.800! [2023-10-05 17:08:21,478][24460] Updated weights for policy 1, policy_version 3840 (0.0018) [2023-10-05 17:08:21,479][24456] Updated weights for policy 0, policy_version 3840 (0.0015) [2023-10-05 17:08:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 1974272. Throughput: 0: 808.3, 1: 808.8. Samples: 494170. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:08:23,652][23454] Avg episode reward: [(0, '9.740'), (1, '9.820')] [2023-10-05 17:08:23,654][24178] Saving new best policy, reward=9.820! [2023-10-05 17:08:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2007040. Throughput: 0: 814.1, 1: 814.8. Samples: 499304. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:08:28,652][23454] Avg episode reward: [(0, '9.740'), (1, '9.850')] [2023-10-05 17:08:28,654][24178] Saving new best policy, reward=9.850! [2023-10-05 17:08:33,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2039808. Throughput: 0: 808.7, 1: 807.3. Samples: 508915. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:08:33,652][23454] Avg episode reward: [(0, '9.770'), (1, '9.850')] [2023-10-05 17:08:33,663][24064] Saving new best policy, reward=9.770! [2023-10-05 17:08:34,202][24460] Updated weights for policy 1, policy_version 4000 (0.0016) [2023-10-05 17:08:34,202][24456] Updated weights for policy 0, policy_version 4000 (0.0018) [2023-10-05 17:08:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2072576. Throughput: 0: 803.3, 1: 804.1. Samples: 518144. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:08:38,653][23454] Avg episode reward: [(0, '9.770'), (1, '9.870')] [2023-10-05 17:08:38,654][24178] Saving new best policy, reward=9.870! [2023-10-05 17:08:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2105344. Throughput: 0: 802.3, 1: 802.4. Samples: 522772. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:08:43,652][23454] Avg episode reward: [(0, '9.760'), (1, '9.870')] [2023-10-05 17:08:47,160][24456] Updated weights for policy 0, policy_version 4160 (0.0017) [2023-10-05 17:08:47,161][24460] Updated weights for policy 1, policy_version 4160 (0.0019) [2023-10-05 17:08:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2138112. Throughput: 0: 802.6, 1: 802.0. Samples: 532480. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:08:48,652][23454] Avg episode reward: [(0, '9.760'), (1, '9.860')] [2023-10-05 17:08:53,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 6470.3). Total num frames: 2170880. Throughput: 0: 804.9, 1: 805.0. Samples: 542391. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:08:53,652][23454] Avg episode reward: [(0, '9.740'), (1, '9.860')] [2023-10-05 17:08:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 2203648. Throughput: 0: 799.9, 1: 800.0. Samples: 547062. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:08:58,652][23454] Avg episode reward: [(0, '9.730'), (1, '9.830')] [2023-10-05 17:08:59,720][24456] Updated weights for policy 0, policy_version 4320 (0.0017) [2023-10-05 17:08:59,720][24460] Updated weights for policy 1, policy_version 4320 (0.0015) [2023-10-05 17:09:03,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 2236416. Throughput: 0: 805.1, 1: 803.8. Samples: 557056. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:09:03,653][23454] Avg episode reward: [(0, '9.740'), (1, '9.830')] [2023-10-05 17:09:03,662][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000004368_1118208.pth... [2023-10-05 17:09:03,662][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000004368_1118208.pth... [2023-10-05 17:09:03,694][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000001328_339968.pth [2023-10-05 17:09:03,700][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000001328_339968.pth [2023-10-05 17:09:08,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 2269184. Throughput: 0: 807.6, 1: 805.6. Samples: 566765. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:09:08,653][23454] Avg episode reward: [(0, '9.720'), (1, '9.760')] [2023-10-05 17:09:12,461][24456] Updated weights for policy 0, policy_version 4480 (0.0017) [2023-10-05 17:09:12,462][24460] Updated weights for policy 1, policy_version 4480 (0.0018) [2023-10-05 17:09:13,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2293760. Throughput: 0: 801.4, 1: 800.6. Samples: 571392. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:09:13,652][23454] Avg episode reward: [(0, '9.710'), (1, '9.760')] [2023-10-05 17:09:18,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2326528. Throughput: 0: 802.3, 1: 803.6. Samples: 581179. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:09:18,653][23454] Avg episode reward: [(0, '9.700'), (1, '9.730')] [2023-10-05 17:09:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2359296. Throughput: 0: 806.2, 1: 808.0. Samples: 590782. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:09:23,652][23454] Avg episode reward: [(0, '9.700'), (1, '9.730')] [2023-10-05 17:09:25,130][24456] Updated weights for policy 0, policy_version 4640 (0.0019) [2023-10-05 17:09:25,130][24460] Updated weights for policy 1, policy_version 4640 (0.0018) [2023-10-05 17:09:28,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2392064. Throughput: 0: 811.3, 1: 811.3. Samples: 595787. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:09:28,653][23454] Avg episode reward: [(0, '9.690'), (1, '9.680')] [2023-10-05 17:09:33,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2424832. Throughput: 0: 808.1, 1: 807.9. Samples: 605199. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:09:33,652][23454] Avg episode reward: [(0, '9.690'), (1, '9.680')] [2023-10-05 17:09:37,867][24460] Updated weights for policy 1, policy_version 4800 (0.0014) [2023-10-05 17:09:37,868][24456] Updated weights for policy 0, policy_version 4800 (0.0016) [2023-10-05 17:09:38,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2457600. Throughput: 0: 804.7, 1: 804.6. Samples: 614806. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:09:38,652][23454] Avg episode reward: [(0, '9.680'), (1, '9.650')] [2023-10-05 17:09:43,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2490368. Throughput: 0: 808.4, 1: 808.6. Samples: 619830. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:09:43,652][23454] Avg episode reward: [(0, '9.680'), (1, '9.650')] [2023-10-05 17:09:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2523136. Throughput: 0: 803.0, 1: 803.5. Samples: 629351. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:09:48,652][23454] Avg episode reward: [(0, '9.640'), (1, '9.640')] [2023-10-05 17:09:50,627][24460] Updated weights for policy 1, policy_version 4960 (0.0018) [2023-10-05 17:09:50,627][24456] Updated weights for policy 0, policy_version 4960 (0.0018) [2023-10-05 17:09:53,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 2555904. Throughput: 0: 801.7, 1: 803.0. Samples: 638979. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:09:53,653][23454] Avg episode reward: [(0, '9.640'), (1, '9.640')] [2023-10-05 17:09:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2588672. Throughput: 0: 806.3, 1: 807.1. Samples: 643996. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:09:58,652][23454] Avg episode reward: [(0, '9.650'), (1, '9.630')] [2023-10-05 17:10:03,382][24456] Updated weights for policy 0, policy_version 5120 (0.0015) [2023-10-05 17:10:03,382][24460] Updated weights for policy 1, policy_version 5120 (0.0019) [2023-10-05 17:10:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2621440. Throughput: 0: 801.7, 1: 801.2. Samples: 653312. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:10:03,652][23454] Avg episode reward: [(0, '9.650'), (1, '9.640')] [2023-10-05 17:10:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 2654208. Throughput: 0: 806.9, 1: 805.2. Samples: 663329. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:10:08,652][23454] Avg episode reward: [(0, '9.630'), (1, '9.640')] [2023-10-05 17:10:13,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 2686976. Throughput: 0: 801.8, 1: 802.0. Samples: 667958. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:10:13,653][23454] Avg episode reward: [(0, '9.630'), (1, '9.590')] [2023-10-05 17:10:15,890][24456] Updated weights for policy 0, policy_version 5280 (0.0018) [2023-10-05 17:10:15,891][24460] Updated weights for policy 1, policy_version 5280 (0.0016) [2023-10-05 17:10:18,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 2719744. Throughput: 0: 807.6, 1: 807.8. Samples: 677889. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:10:18,652][23454] Avg episode reward: [(0, '9.650'), (1, '9.590')] [2023-10-05 17:10:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 2752512. Throughput: 0: 807.8, 1: 808.7. Samples: 687549. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:10:23,652][23454] Avg episode reward: [(0, '9.650'), (1, '9.580')] [2023-10-05 17:10:28,652][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 2777088. Throughput: 0: 804.7, 1: 804.0. Samples: 692225. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:10:28,652][23454] Avg episode reward: [(0, '9.650'), (1, '9.580')] [2023-10-05 17:10:28,770][24456] Updated weights for policy 0, policy_version 5440 (0.0020) [2023-10-05 17:10:28,771][24460] Updated weights for policy 1, policy_version 5440 (0.0018) [2023-10-05 17:10:33,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 2809856. Throughput: 0: 807.6, 1: 807.2. Samples: 702019. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:10:33,652][23454] Avg episode reward: [(0, '9.650'), (1, '9.560')] [2023-10-05 17:10:38,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6442.5). Total num frames: 2842624. Throughput: 0: 808.4, 1: 810.2. Samples: 711813. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:10:38,653][23454] Avg episode reward: [(0, '9.620'), (1, '9.560')] [2023-10-05 17:10:41,292][24456] Updated weights for policy 0, policy_version 5600 (0.0014) [2023-10-05 17:10:41,292][24460] Updated weights for policy 1, policy_version 5600 (0.0018) [2023-10-05 17:10:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 2875392. Throughput: 0: 809.3, 1: 808.6. Samples: 716800. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:10:43,652][23454] Avg episode reward: [(0, '9.620'), (1, '9.530')] [2023-10-05 17:10:48,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 2908160. Throughput: 0: 808.1, 1: 807.4. Samples: 726009. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:10:48,652][23454] Avg episode reward: [(0, '9.610'), (1, '9.530')] [2023-10-05 17:10:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 2940928. Throughput: 0: 801.4, 1: 801.8. Samples: 735476. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:10:53,652][23454] Avg episode reward: [(0, '9.620'), (1, '9.510')] [2023-10-05 17:10:54,176][24456] Updated weights for policy 0, policy_version 5760 (0.0017) [2023-10-05 17:10:54,177][24460] Updated weights for policy 1, policy_version 5760 (0.0019) [2023-10-05 17:10:58,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6442.5). Total num frames: 2973696. Throughput: 0: 807.6, 1: 807.4. Samples: 740637. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:10:58,652][23454] Avg episode reward: [(0, '9.630'), (1, '9.510')] [2023-10-05 17:11:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 3006464. Throughput: 0: 802.4, 1: 802.6. Samples: 750116. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:11:03,653][23454] Avg episode reward: [(0, '9.630'), (1, '9.460')] [2023-10-05 17:11:03,663][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000005872_1503232.pth... [2023-10-05 17:11:03,663][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000005872_1503232.pth... [2023-10-05 17:11:03,700][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000002848_729088.pth [2023-10-05 17:11:03,705][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000002848_729088.pth [2023-10-05 17:11:06,828][24460] Updated weights for policy 1, policy_version 5920 (0.0016) [2023-10-05 17:11:06,829][24456] Updated weights for policy 0, policy_version 5920 (0.0017) [2023-10-05 17:11:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 3039232. Throughput: 0: 803.7, 1: 802.2. Samples: 759813. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:11:08,652][23454] Avg episode reward: [(0, '9.630'), (1, '9.460')] [2023-10-05 17:11:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 3072000. Throughput: 0: 805.8, 1: 806.2. Samples: 764766. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:11:13,652][23454] Avg episode reward: [(0, '9.610'), (1, '9.440')] [2023-10-05 17:11:18,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 3104768. Throughput: 0: 802.4, 1: 803.0. Samples: 774261. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:11:18,652][23454] Avg episode reward: [(0, '9.610'), (1, '9.440')] [2023-10-05 17:11:19,518][24460] Updated weights for policy 1, policy_version 6080 (0.0016) [2023-10-05 17:11:19,518][24456] Updated weights for policy 0, policy_version 6080 (0.0019) [2023-10-05 17:11:23,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 3137536. Throughput: 0: 806.9, 1: 804.7. Samples: 784338. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:11:23,653][23454] Avg episode reward: [(0, '9.600'), (1, '9.410')] [2023-10-05 17:11:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3170304. Throughput: 0: 799.7, 1: 799.8. Samples: 788778. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:11:28,652][23454] Avg episode reward: [(0, '9.590'), (1, '9.410')] [2023-10-05 17:11:32,223][24456] Updated weights for policy 0, policy_version 6240 (0.0015) [2023-10-05 17:11:32,223][24460] Updated weights for policy 1, policy_version 6240 (0.0017) [2023-10-05 17:11:33,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3203072. Throughput: 0: 807.5, 1: 808.3. Samples: 798720. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:11:33,652][23454] Avg episode reward: [(0, '9.580'), (1, '9.390')] [2023-10-05 17:11:38,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3235840. Throughput: 0: 811.9, 1: 812.4. Samples: 808567. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:11:38,653][23454] Avg episode reward: [(0, '9.580'), (1, '9.390')] [2023-10-05 17:11:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3268608. Throughput: 0: 805.5, 1: 805.8. Samples: 813145. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:11:43,652][23454] Avg episode reward: [(0, '9.580'), (1, '9.380')] [2023-10-05 17:11:44,815][24456] Updated weights for policy 0, policy_version 6400 (0.0018) [2023-10-05 17:11:44,816][24460] Updated weights for policy 1, policy_version 6400 (0.0019) [2023-10-05 17:11:48,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3301376. Throughput: 0: 812.4, 1: 813.0. Samples: 823259. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:11:48,652][23454] Avg episode reward: [(0, '9.570'), (1, '9.370')] [2023-10-05 17:11:53,651][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3325952. Throughput: 0: 809.9, 1: 811.0. Samples: 832755. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:11:53,652][23454] Avg episode reward: [(0, '9.550'), (1, '9.370')] [2023-10-05 17:11:57,514][24460] Updated weights for policy 1, policy_version 6560 (0.0017) [2023-10-05 17:11:57,514][24456] Updated weights for policy 0, policy_version 6560 (0.0018) [2023-10-05 17:11:58,651][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3358720. Throughput: 0: 809.9, 1: 809.4. Samples: 837632. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:11:58,652][23454] Avg episode reward: [(0, '9.550'), (1, '9.380')] [2023-10-05 17:12:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3391488. Throughput: 0: 810.9, 1: 811.6. Samples: 847274. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:12:03,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.380')] [2023-10-05 17:12:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3424256. Throughput: 0: 806.1, 1: 807.2. Samples: 856935. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:12:08,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.360')] [2023-10-05 17:12:10,108][24456] Updated weights for policy 0, policy_version 6720 (0.0016) [2023-10-05 17:12:10,110][24460] Updated weights for policy 1, policy_version 6720 (0.0018) [2023-10-05 17:12:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3457024. Throughput: 0: 814.3, 1: 814.8. Samples: 862086. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:12:13,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.360')] [2023-10-05 17:12:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3489792. Throughput: 0: 809.3, 1: 810.1. Samples: 871591. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:12:18,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.370')] [2023-10-05 17:12:22,736][24460] Updated weights for policy 1, policy_version 6880 (0.0018) [2023-10-05 17:12:22,736][24456] Updated weights for policy 0, policy_version 6880 (0.0016) [2023-10-05 17:12:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3522560. Throughput: 0: 808.5, 1: 808.0. Samples: 881311. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:12:23,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.370')] [2023-10-05 17:12:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3555328. Throughput: 0: 815.1, 1: 814.9. Samples: 886493. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:12:28,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.380')] [2023-10-05 17:12:33,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6442.5). Total num frames: 3588096. Throughput: 0: 807.8, 1: 807.3. Samples: 895938. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:12:33,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.380')] [2023-10-05 17:12:35,331][24456] Updated weights for policy 0, policy_version 7040 (0.0015) [2023-10-05 17:12:35,332][24460] Updated weights for policy 1, policy_version 7040 (0.0018) [2023-10-05 17:12:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3620864. Throughput: 0: 809.0, 1: 808.5. Samples: 905540. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:12:38,653][23454] Avg episode reward: [(0, '9.540'), (1, '9.380')] [2023-10-05 17:12:43,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 3653632. Throughput: 0: 809.4, 1: 809.9. Samples: 910502. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:12:43,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.380')] [2023-10-05 17:12:47,994][24456] Updated weights for policy 0, policy_version 7200 (0.0017) [2023-10-05 17:12:47,996][24460] Updated weights for policy 1, policy_version 7200 (0.0017) [2023-10-05 17:12:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6456.4). Total num frames: 3686400. Throughput: 0: 810.8, 1: 810.0. Samples: 920209. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:12:48,653][23454] Avg episode reward: [(0, '9.520'), (1, '9.390')] [2023-10-05 17:12:53,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3719168. Throughput: 0: 810.7, 1: 810.4. Samples: 929883. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:12:53,653][23454] Avg episode reward: [(0, '9.480'), (1, '9.390')] [2023-10-05 17:12:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3751936. Throughput: 0: 810.9, 1: 810.8. Samples: 935059. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:12:58,652][23454] Avg episode reward: [(0, '9.470'), (1, '9.360')] [2023-10-05 17:13:00,520][24456] Updated weights for policy 0, policy_version 7360 (0.0017) [2023-10-05 17:13:00,520][24460] Updated weights for policy 1, policy_version 7360 (0.0017) [2023-10-05 17:13:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3784704. Throughput: 0: 812.9, 1: 812.3. Samples: 944724. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:13:03,652][23454] Avg episode reward: [(0, '9.480'), (1, '9.360')] [2023-10-05 17:13:03,661][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000007392_1892352.pth... [2023-10-05 17:13:03,662][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000007392_1892352.pth... [2023-10-05 17:13:03,698][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000004368_1118208.pth [2023-10-05 17:13:03,699][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000004368_1118208.pth [2023-10-05 17:13:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3817472. Throughput: 0: 812.4, 1: 812.4. Samples: 954427. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:13:08,652][23454] Avg episode reward: [(0, '9.470'), (1, '9.370')] [2023-10-05 17:13:13,167][24460] Updated weights for policy 1, policy_version 7520 (0.0019) [2023-10-05 17:13:13,168][24456] Updated weights for policy 0, policy_version 7520 (0.0017) [2023-10-05 17:13:13,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3850240. Throughput: 0: 808.5, 1: 808.6. Samples: 959263. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:13:13,652][23454] Avg episode reward: [(0, '9.490'), (1, '9.370')] [2023-10-05 17:13:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3883008. Throughput: 0: 810.0, 1: 810.3. Samples: 968854. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:13:18,652][23454] Avg episode reward: [(0, '9.500'), (1, '9.380')] [2023-10-05 17:13:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3915776. Throughput: 0: 816.0, 1: 815.2. Samples: 978944. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:13:23,652][23454] Avg episode reward: [(0, '9.500'), (1, '9.390')] [2023-10-05 17:13:25,859][24460] Updated weights for policy 1, policy_version 7680 (0.0017) [2023-10-05 17:13:25,861][24456] Updated weights for policy 0, policy_version 7680 (0.0018) [2023-10-05 17:13:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3948544. Throughput: 0: 810.6, 1: 810.5. Samples: 983452. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:13:28,652][23454] Avg episode reward: [(0, '9.500'), (1, '9.390')] [2023-10-05 17:13:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 3981312. Throughput: 0: 812.1, 1: 811.6. Samples: 993275. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:13:33,653][23454] Avg episode reward: [(0, '9.520'), (1, '9.450')] [2023-10-05 17:13:38,547][24460] Updated weights for policy 1, policy_version 7840 (0.0017) [2023-10-05 17:13:38,547][24456] Updated weights for policy 0, policy_version 7840 (0.0017) [2023-10-05 17:13:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4014080. Throughput: 0: 812.4, 1: 812.8. Samples: 1003019. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:13:38,652][23454] Avg episode reward: [(0, '9.520'), (1, '9.450')] [2023-10-05 17:13:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4046848. Throughput: 0: 806.4, 1: 806.0. Samples: 1007617. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:13:43,653][23454] Avg episode reward: [(0, '9.530'), (1, '9.460')] [2023-10-05 17:13:48,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4079616. Throughput: 0: 810.8, 1: 811.3. Samples: 1017717. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:13:48,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.460')] [2023-10-05 17:13:51,168][24460] Updated weights for policy 1, policy_version 8000 (0.0017) [2023-10-05 17:13:51,168][24456] Updated weights for policy 0, policy_version 8000 (0.0016) [2023-10-05 17:13:53,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 4104192. Throughput: 0: 809.8, 1: 810.0. Samples: 1027317. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:13:53,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.490')] [2023-10-05 17:13:58,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.0, 300 sec: 6442.5). Total num frames: 4136960. Throughput: 0: 810.5, 1: 810.1. Samples: 1032192. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:13:58,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.490')] [2023-10-05 17:14:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 4169728. Throughput: 0: 810.6, 1: 811.9. Samples: 1041865. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:14:03,653][23454] Avg episode reward: [(0, '9.520'), (1, '9.510')] [2023-10-05 17:14:03,908][24456] Updated weights for policy 0, policy_version 8160 (0.0017) [2023-10-05 17:14:03,908][24460] Updated weights for policy 1, policy_version 8160 (0.0016) [2023-10-05 17:14:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 4202496. Throughput: 0: 804.2, 1: 804.9. Samples: 1051352. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:14:08,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.510')] [2023-10-05 17:14:13,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 4235264. Throughput: 0: 810.8, 1: 810.9. Samples: 1056426. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:14:13,653][23454] Avg episode reward: [(0, '9.530'), (1, '9.520')] [2023-10-05 17:14:16,550][24460] Updated weights for policy 1, policy_version 8320 (0.0016) [2023-10-05 17:14:16,551][24456] Updated weights for policy 0, policy_version 8320 (0.0017) [2023-10-05 17:14:18,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 4268032. Throughput: 0: 807.7, 1: 808.3. Samples: 1065994. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:14:18,653][23454] Avg episode reward: [(0, '9.520'), (1, '9.520')] [2023-10-05 17:14:23,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 4300800. Throughput: 0: 806.5, 1: 806.0. Samples: 1075585. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:14:23,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.530')] [2023-10-05 17:14:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 4333568. Throughput: 0: 810.2, 1: 811.3. Samples: 1080583. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:14:28,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.530')] [2023-10-05 17:14:29,183][24460] Updated weights for policy 1, policy_version 8480 (0.0016) [2023-10-05 17:14:29,183][24456] Updated weights for policy 0, policy_version 8480 (0.0017) [2023-10-05 17:14:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 4366336. Throughput: 0: 806.2, 1: 806.4. Samples: 1090287. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:14:33,653][23454] Avg episode reward: [(0, '9.520'), (1, '9.580')] [2023-10-05 17:14:38,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 4399104. Throughput: 0: 809.0, 1: 808.9. Samples: 1100119. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:14:38,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.580')] [2023-10-05 17:14:41,627][24460] Updated weights for policy 1, policy_version 8640 (0.0016) [2023-10-05 17:14:41,627][24456] Updated weights for policy 0, policy_version 8640 (0.0018) [2023-10-05 17:14:43,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 4431872. Throughput: 0: 811.2, 1: 812.0. Samples: 1105235. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:14:43,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.590')] [2023-10-05 17:14:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 4464640. Throughput: 0: 813.8, 1: 812.5. Samples: 1115049. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:14:48,652][23454] Avg episode reward: [(0, '9.500'), (1, '9.600')] [2023-10-05 17:14:53,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4497408. Throughput: 0: 813.8, 1: 813.4. Samples: 1124576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:14:53,653][23454] Avg episode reward: [(0, '9.500'), (1, '9.610')] [2023-10-05 17:14:54,211][24460] Updated weights for policy 1, policy_version 8800 (0.0017) [2023-10-05 17:14:54,211][24456] Updated weights for policy 0, policy_version 8800 (0.0018) [2023-10-05 17:14:58,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4530176. Throughput: 0: 812.4, 1: 812.4. Samples: 1129544. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:14:58,652][23454] Avg episode reward: [(0, '9.520'), (1, '9.610')] [2023-10-05 17:15:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4562944. Throughput: 0: 808.1, 1: 807.6. Samples: 1138701. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:15:03,653][23454] Avg episode reward: [(0, '9.510'), (1, '9.610')] [2023-10-05 17:15:03,663][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000008912_2281472.pth... [2023-10-05 17:15:03,663][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000008912_2281472.pth... [2023-10-05 17:15:03,698][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000005872_1503232.pth [2023-10-05 17:15:03,701][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000005872_1503232.pth [2023-10-05 17:15:07,058][24460] Updated weights for policy 1, policy_version 8960 (0.0018) [2023-10-05 17:15:07,059][24456] Updated weights for policy 0, policy_version 8960 (0.0018) [2023-10-05 17:15:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4595712. Throughput: 0: 813.3, 1: 813.7. Samples: 1148800. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:15:08,652][23454] Avg episode reward: [(0, '9.490'), (1, '9.630')] [2023-10-05 17:15:13,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4628480. Throughput: 0: 809.7, 1: 809.1. Samples: 1153426. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:15:13,652][23454] Avg episode reward: [(0, '9.490'), (1, '9.630')] [2023-10-05 17:15:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 4661248. Throughput: 0: 811.2, 1: 810.5. Samples: 1163265. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:15:18,652][23454] Avg episode reward: [(0, '9.490'), (1, '9.650')] [2023-10-05 17:15:19,704][24460] Updated weights for policy 1, policy_version 9120 (0.0017) [2023-10-05 17:15:19,705][24456] Updated weights for policy 0, policy_version 9120 (0.0018) [2023-10-05 17:15:23,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 4694016. Throughput: 0: 811.2, 1: 810.8. Samples: 1173107. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:15:23,652][23454] Avg episode reward: [(0, '9.500'), (1, '9.650')] [2023-10-05 17:15:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 4726784. Throughput: 0: 805.3, 1: 805.1. Samples: 1177701. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:15:28,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.710')] [2023-10-05 17:15:32,319][24456] Updated weights for policy 0, policy_version 9280 (0.0019) [2023-10-05 17:15:32,319][24460] Updated weights for policy 1, policy_version 9280 (0.0018) [2023-10-05 17:15:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 4759552. Throughput: 0: 808.9, 1: 808.4. Samples: 1187825. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:15:33,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.710')] [2023-10-05 17:15:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 4792320. Throughput: 0: 809.4, 1: 810.3. Samples: 1197465. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:15:38,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.720')] [2023-10-05 17:15:43,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 4825088. Throughput: 0: 807.3, 1: 806.7. Samples: 1202176. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:15:43,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.720')] [2023-10-05 17:15:44,866][24460] Updated weights for policy 1, policy_version 9440 (0.0017) [2023-10-05 17:15:44,867][24456] Updated weights for policy 0, policy_version 9440 (0.0018) [2023-10-05 17:15:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 4857856. Throughput: 0: 817.3, 1: 817.5. Samples: 1212267. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:15:48,652][23454] Avg episode reward: [(0, '9.520'), (1, '9.730')] [2023-10-05 17:15:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 4890624. Throughput: 0: 814.1, 1: 813.5. Samples: 1222043. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:15:53,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.730')] [2023-10-05 17:15:57,401][24456] Updated weights for policy 0, policy_version 9600 (0.0017) [2023-10-05 17:15:57,402][24460] Updated weights for policy 1, policy_version 9600 (0.0017) [2023-10-05 17:15:58,651][23454] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 4919296. Throughput: 0: 815.0, 1: 814.4. Samples: 1226752. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:15:58,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.760')] [2023-10-05 17:16:03,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 4947968. Throughput: 0: 816.4, 1: 815.6. Samples: 1236702. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:16:03,653][23454] Avg episode reward: [(0, '9.510'), (1, '9.760')] [2023-10-05 17:16:08,651][23454] Fps is (10 sec: 6144.0, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 4980736. Throughput: 0: 813.4, 1: 813.8. Samples: 1246328. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:16:08,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.770')] [2023-10-05 17:16:09,982][24456] Updated weights for policy 0, policy_version 9760 (0.0016) [2023-10-05 17:16:09,983][24460] Updated weights for policy 1, policy_version 9760 (0.0017) [2023-10-05 17:16:13,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 5013504. Throughput: 0: 818.4, 1: 817.8. Samples: 1251328. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:16:13,653][23454] Avg episode reward: [(0, '9.510'), (1, '9.770')] [2023-10-05 17:16:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5046272. Throughput: 0: 813.4, 1: 814.5. Samples: 1261082. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:16:18,652][23454] Avg episode reward: [(0, '9.520'), (1, '9.770')] [2023-10-05 17:16:22,567][24456] Updated weights for policy 0, policy_version 9920 (0.0015) [2023-10-05 17:16:22,568][24460] Updated weights for policy 1, policy_version 9920 (0.0017) [2023-10-05 17:16:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5079040. Throughput: 0: 814.6, 1: 814.1. Samples: 1270757. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:16:23,652][23454] Avg episode reward: [(0, '9.490'), (1, '9.760')] [2023-10-05 17:16:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5111808. Throughput: 0: 818.9, 1: 819.2. Samples: 1275891. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:16:28,652][23454] Avg episode reward: [(0, '9.490'), (1, '9.770')] [2023-10-05 17:16:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5144576. Throughput: 0: 814.4, 1: 815.5. Samples: 1285613. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:16:33,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.780')] [2023-10-05 17:16:35,251][24456] Updated weights for policy 0, policy_version 10080 (0.0016) [2023-10-05 17:16:35,251][24460] Updated weights for policy 1, policy_version 10080 (0.0018) [2023-10-05 17:16:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5177344. Throughput: 0: 808.0, 1: 808.5. Samples: 1294785. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:16:38,652][23454] Avg episode reward: [(0, '9.510'), (1, '9.770')] [2023-10-05 17:16:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 5210112. Throughput: 0: 811.8, 1: 812.1. Samples: 1299826. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:16:43,653][23454] Avg episode reward: [(0, '9.510'), (1, '9.770')] [2023-10-05 17:16:47,880][24460] Updated weights for policy 1, policy_version 10240 (0.0017) [2023-10-05 17:16:47,881][24456] Updated weights for policy 0, policy_version 10240 (0.0017) [2023-10-05 17:16:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 5242880. Throughput: 0: 808.7, 1: 810.5. Samples: 1309569. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:16:48,652][23454] Avg episode reward: [(0, '9.530'), (1, '9.770')] [2023-10-05 17:16:53,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 5275648. Throughput: 0: 807.0, 1: 806.7. Samples: 1318945. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:16:53,653][23454] Avg episode reward: [(0, '9.540'), (1, '9.770')] [2023-10-05 17:16:58,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 5308416. Throughput: 0: 807.6, 1: 808.3. Samples: 1324043. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:16:58,652][23454] Avg episode reward: [(0, '9.540'), (1, '9.770')] [2023-10-05 17:17:00,650][24460] Updated weights for policy 1, policy_version 10400 (0.0018) [2023-10-05 17:17:00,650][24456] Updated weights for policy 0, policy_version 10400 (0.0018) [2023-10-05 17:17:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5341184. Throughput: 0: 806.1, 1: 805.5. Samples: 1333602. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:17:03,652][23454] Avg episode reward: [(0, '9.560'), (1, '9.770')] [2023-10-05 17:17:03,662][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000010432_2670592.pth... [2023-10-05 17:17:03,662][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000010432_2670592.pth... [2023-10-05 17:17:03,698][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000007392_1892352.pth [2023-10-05 17:17:03,701][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000007392_1892352.pth [2023-10-05 17:17:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5373952. Throughput: 0: 808.5, 1: 807.7. Samples: 1343488. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:17:08,652][23454] Avg episode reward: [(0, '9.580'), (1, '9.770')] [2023-10-05 17:17:13,261][24460] Updated weights for policy 1, policy_version 10560 (0.0017) [2023-10-05 17:17:13,263][24456] Updated weights for policy 0, policy_version 10560 (0.0018) [2023-10-05 17:17:13,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5406720. Throughput: 0: 803.1, 1: 803.7. Samples: 1348198. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:13,653][23454] Avg episode reward: [(0, '9.600'), (1, '9.760')] [2023-10-05 17:17:18,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5439488. Throughput: 0: 803.0, 1: 801.8. Samples: 1357828. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:18,653][23454] Avg episode reward: [(0, '9.610'), (1, '9.760')] [2023-10-05 17:17:23,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5472256. Throughput: 0: 812.6, 1: 813.2. Samples: 1367949. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:23,652][23454] Avg episode reward: [(0, '9.610'), (1, '9.790')] [2023-10-05 17:17:25,898][24460] Updated weights for policy 1, policy_version 10720 (0.0017) [2023-10-05 17:17:25,898][24456] Updated weights for policy 0, policy_version 10720 (0.0018) [2023-10-05 17:17:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5505024. Throughput: 0: 807.4, 1: 807.3. Samples: 1372485. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:28,652][23454] Avg episode reward: [(0, '9.600'), (1, '9.790')] [2023-10-05 17:17:33,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5537792. Throughput: 0: 809.7, 1: 808.7. Samples: 1382400. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:33,652][23454] Avg episode reward: [(0, '9.590'), (1, '9.790')] [2023-10-05 17:17:38,504][24460] Updated weights for policy 1, policy_version 10880 (0.0019) [2023-10-05 17:17:38,505][24456] Updated weights for policy 0, policy_version 10880 (0.0019) [2023-10-05 17:17:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5570560. Throughput: 0: 814.5, 1: 814.2. Samples: 1392239. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:38,652][23454] Avg episode reward: [(0, '9.590'), (1, '9.790')] [2023-10-05 17:17:43,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5603328. Throughput: 0: 809.4, 1: 809.3. Samples: 1396885. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:43,652][23454] Avg episode reward: [(0, '9.600'), (1, '9.800')] [2023-10-05 17:17:48,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5636096. Throughput: 0: 815.5, 1: 815.1. Samples: 1406976. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:48,652][23454] Avg episode reward: [(0, '9.610'), (1, '9.800')] [2023-10-05 17:17:51,053][24460] Updated weights for policy 1, policy_version 11040 (0.0017) [2023-10-05 17:17:51,054][24456] Updated weights for policy 0, policy_version 11040 (0.0018) [2023-10-05 17:17:53,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5668864. Throughput: 0: 814.0, 1: 814.5. Samples: 1416771. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:17:53,653][23454] Avg episode reward: [(0, '9.610'), (1, '9.800')] [2023-10-05 17:17:58,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 5701632. Throughput: 0: 814.0, 1: 813.6. Samples: 1421439. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:17:58,653][23454] Avg episode reward: [(0, '9.620'), (1, '9.810')] [2023-10-05 17:18:03,652][23454] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 5730304. Throughput: 0: 817.8, 1: 819.0. Samples: 1431484. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:18:03,653][23454] Avg episode reward: [(0, '9.610'), (1, '9.820')] [2023-10-05 17:18:03,654][24456] Updated weights for policy 0, policy_version 11200 (0.0017) [2023-10-05 17:18:03,654][24460] Updated weights for policy 1, policy_version 11200 (0.0018) [2023-10-05 17:18:08,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 5758976. Throughput: 0: 810.7, 1: 810.0. Samples: 1440881. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:18:08,652][23454] Avg episode reward: [(0, '9.590'), (1, '9.830')] [2023-10-05 17:18:13,651][23454] Fps is (10 sec: 6144.2, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5791744. Throughput: 0: 815.7, 1: 815.5. Samples: 1445888. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:18:13,652][23454] Avg episode reward: [(0, '9.590'), (1, '9.830')] [2023-10-05 17:18:16,276][24456] Updated weights for policy 0, policy_version 11360 (0.0018) [2023-10-05 17:18:16,276][24460] Updated weights for policy 1, policy_version 11360 (0.0018) [2023-10-05 17:18:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5824512. Throughput: 0: 813.9, 1: 814.4. Samples: 1455672. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:18:18,652][23454] Avg episode reward: [(0, '9.600'), (1, '9.840')] [2023-10-05 17:18:23,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 5857280. Throughput: 0: 813.4, 1: 813.7. Samples: 1465456. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:18:23,653][23454] Avg episode reward: [(0, '9.610'), (1, '9.850')] [2023-10-05 17:18:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 5890048. Throughput: 0: 817.8, 1: 817.3. Samples: 1470464. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:18:28,652][23454] Avg episode reward: [(0, '9.630'), (1, '9.860')] [2023-10-05 17:18:28,848][24460] Updated weights for policy 1, policy_version 11520 (0.0017) [2023-10-05 17:18:28,848][24456] Updated weights for policy 0, policy_version 11520 (0.0016) [2023-10-05 17:18:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5922816. Throughput: 0: 810.0, 1: 810.6. Samples: 1479900. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:18:33,652][23454] Avg episode reward: [(0, '9.640'), (1, '9.860')] [2023-10-05 17:18:38,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 5955584. Throughput: 0: 807.4, 1: 807.7. Samples: 1489448. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:18:38,652][23454] Avg episode reward: [(0, '9.650'), (1, '9.850')] [2023-10-05 17:18:41,602][24456] Updated weights for policy 0, policy_version 11680 (0.0018) [2023-10-05 17:18:41,602][24460] Updated weights for policy 1, policy_version 11680 (0.0017) [2023-10-05 17:18:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 5988352. Throughput: 0: 810.7, 1: 811.4. Samples: 1494434. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:18:43,652][23454] Avg episode reward: [(0, '9.660'), (1, '9.850')] [2023-10-05 17:18:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 6021120. Throughput: 0: 807.7, 1: 806.7. Samples: 1504131. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:18:48,652][23454] Avg episode reward: [(0, '9.660'), (1, '9.850')] [2023-10-05 17:18:53,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 6053888. Throughput: 0: 811.4, 1: 811.1. Samples: 1513893. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:18:53,653][23454] Avg episode reward: [(0, '9.680'), (1, '9.850')] [2023-10-05 17:18:54,073][24460] Updated weights for policy 1, policy_version 11840 (0.0019) [2023-10-05 17:18:54,073][24456] Updated weights for policy 0, policy_version 11840 (0.0018) [2023-10-05 17:18:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 6086656. Throughput: 0: 813.5, 1: 813.3. Samples: 1519092. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:18:58,652][23454] Avg episode reward: [(0, '9.680'), (1, '9.850')] [2023-10-05 17:19:03,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6485.4, 300 sec: 6498.1). Total num frames: 6119424. Throughput: 0: 811.2, 1: 811.3. Samples: 1528683. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:19:03,652][23454] Avg episode reward: [(0, '9.680'), (1, '9.850')] [2023-10-05 17:19:03,661][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000011952_3059712.pth... [2023-10-05 17:19:03,661][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000011952_3059712.pth... [2023-10-05 17:19:03,696][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000008912_2281472.pth [2023-10-05 17:19:03,698][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000008912_2281472.pth [2023-10-05 17:19:06,584][24456] Updated weights for policy 0, policy_version 12000 (0.0017) [2023-10-05 17:19:06,584][24460] Updated weights for policy 1, policy_version 12000 (0.0017) [2023-10-05 17:19:08,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6152192. Throughput: 0: 810.2, 1: 810.4. Samples: 1538386. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:19:08,653][23454] Avg episode reward: [(0, '9.690'), (1, '9.840')] [2023-10-05 17:19:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6184960. Throughput: 0: 810.9, 1: 811.3. Samples: 1543465. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:19:13,652][23454] Avg episode reward: [(0, '9.700'), (1, '9.840')] [2023-10-05 17:19:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6217728. Throughput: 0: 814.6, 1: 814.7. Samples: 1553220. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:19:18,652][23454] Avg episode reward: [(0, '9.700'), (1, '9.840')] [2023-10-05 17:19:19,206][24460] Updated weights for policy 1, policy_version 12160 (0.0018) [2023-10-05 17:19:19,206][24456] Updated weights for policy 0, policy_version 12160 (0.0017) [2023-10-05 17:19:23,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6250496. Throughput: 0: 815.4, 1: 815.0. Samples: 1562816. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:19:23,652][23454] Avg episode reward: [(0, '9.720'), (1, '9.840')] [2023-10-05 17:19:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6283264. Throughput: 0: 816.2, 1: 816.0. Samples: 1567885. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:19:28,652][23454] Avg episode reward: [(0, '9.740'), (1, '9.850')] [2023-10-05 17:19:31,730][24460] Updated weights for policy 1, policy_version 12320 (0.0017) [2023-10-05 17:19:31,730][24456] Updated weights for policy 0, policy_version 12320 (0.0017) [2023-10-05 17:19:33,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6316032. Throughput: 0: 815.8, 1: 816.6. Samples: 1577590. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:19:33,653][23454] Avg episode reward: [(0, '9.730'), (1, '9.850')] [2023-10-05 17:19:38,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6348800. Throughput: 0: 814.8, 1: 814.5. Samples: 1587209. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:19:38,652][23454] Avg episode reward: [(0, '9.730'), (1, '9.840')] [2023-10-05 17:19:43,652][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6381568. Throughput: 0: 809.2, 1: 809.0. Samples: 1591909. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:19:43,653][23454] Avg episode reward: [(0, '9.740'), (1, '9.840')] [2023-10-05 17:19:44,766][24456] Updated weights for policy 0, policy_version 12480 (0.0018) [2023-10-05 17:19:44,766][24460] Updated weights for policy 1, policy_version 12480 (0.0019) [2023-10-05 17:19:48,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6414336. Throughput: 0: 809.8, 1: 809.2. Samples: 1601536. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:19:48,653][23454] Avg episode reward: [(0, '9.740'), (1, '9.840')] [2023-10-05 17:19:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6447104. Throughput: 0: 810.6, 1: 810.5. Samples: 1611336. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:19:53,652][23454] Avg episode reward: [(0, '9.740'), (1, '9.840')] [2023-10-05 17:19:57,279][24460] Updated weights for policy 1, policy_version 12640 (0.0017) [2023-10-05 17:19:57,279][24456] Updated weights for policy 0, policy_version 12640 (0.0017) [2023-10-05 17:19:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6479872. Throughput: 0: 805.5, 1: 805.6. Samples: 1615962. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:19:58,652][23454] Avg episode reward: [(0, '9.740'), (1, '9.840')] [2023-10-05 17:20:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6512640. Throughput: 0: 810.1, 1: 809.6. Samples: 1626106. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:20:03,652][23454] Avg episode reward: [(0, '9.740'), (1, '9.840')] [2023-10-05 17:20:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6545408. Throughput: 0: 812.2, 1: 812.4. Samples: 1635922. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:20:08,652][23454] Avg episode reward: [(0, '9.750'), (1, '9.840')] [2023-10-05 17:20:09,762][24456] Updated weights for policy 0, policy_version 12800 (0.0017) [2023-10-05 17:20:09,763][24460] Updated weights for policy 1, policy_version 12800 (0.0018) [2023-10-05 17:20:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6578176. Throughput: 0: 807.1, 1: 806.7. Samples: 1640506. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:20:13,652][23454] Avg episode reward: [(0, '9.760'), (1, '9.860')] [2023-10-05 17:20:18,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6610944. Throughput: 0: 810.8, 1: 810.5. Samples: 1650549. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:20:18,652][23454] Avg episode reward: [(0, '9.770'), (1, '9.860')] [2023-10-05 17:20:22,385][24460] Updated weights for policy 1, policy_version 12960 (0.0014) [2023-10-05 17:20:22,385][24456] Updated weights for policy 0, policy_version 12960 (0.0016) [2023-10-05 17:20:23,651][23454] Fps is (10 sec: 6144.0, 60 sec: 6485.4, 300 sec: 6484.2). Total num frames: 6639616. Throughput: 0: 811.6, 1: 811.4. Samples: 1660241. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:20:23,652][23454] Avg episode reward: [(0, '9.780'), (1, '9.860')] [2023-10-05 17:20:23,652][24064] Saving new best policy, reward=9.780! [2023-10-05 17:20:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6676480. Throughput: 0: 812.2, 1: 812.6. Samples: 1665025. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:20:28,652][23454] Avg episode reward: [(0, '9.800'), (1, '9.860')] [2023-10-05 17:20:28,653][24064] Saving new best policy, reward=9.800! [2023-10-05 17:20:33,652][23454] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6709248. Throughput: 0: 817.6, 1: 818.4. Samples: 1675156. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:20:33,653][23454] Avg episode reward: [(0, '9.820'), (1, '9.850')] [2023-10-05 17:20:33,664][24064] Saving new best policy, reward=9.820! [2023-10-05 17:20:34,821][24460] Updated weights for policy 1, policy_version 13120 (0.0018) [2023-10-05 17:20:34,821][24456] Updated weights for policy 0, policy_version 13120 (0.0017) [2023-10-05 17:20:38,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6742016. Throughput: 0: 816.5, 1: 816.4. Samples: 1684817. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:20:38,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.850')] [2023-10-05 17:20:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6774784. Throughput: 0: 818.4, 1: 818.0. Samples: 1689600. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:20:43,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.840')] [2023-10-05 17:20:43,654][24064] Saving new best policy, reward=9.850! [2023-10-05 17:20:47,385][24456] Updated weights for policy 0, policy_version 13280 (0.0018) [2023-10-05 17:20:47,386][24460] Updated weights for policy 1, policy_version 13280 (0.0019) [2023-10-05 17:20:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 6807552. Throughput: 0: 817.2, 1: 817.4. Samples: 1699660. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:20:48,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.840')] [2023-10-05 17:20:53,652][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6484.2). Total num frames: 6832128. Throughput: 0: 815.8, 1: 815.6. Samples: 1709338. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:20:53,653][23454] Avg episode reward: [(0, '9.850'), (1, '9.830')] [2023-10-05 17:20:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 6873088. Throughput: 0: 818.8, 1: 818.3. Samples: 1714177. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:20:58,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.830')] [2023-10-05 17:20:59,863][24456] Updated weights for policy 0, policy_version 13440 (0.0016) [2023-10-05 17:20:59,863][24460] Updated weights for policy 1, policy_version 13440 (0.0016) [2023-10-05 17:21:03,652][23454] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 6905856. Throughput: 0: 819.4, 1: 819.1. Samples: 1724280. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:03,653][23454] Avg episode reward: [(0, '9.840'), (1, '9.830')] [2023-10-05 17:21:03,664][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000013488_3452928.pth... [2023-10-05 17:21:03,664][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000013488_3452928.pth... [2023-10-05 17:21:03,703][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000010432_2670592.pth [2023-10-05 17:21:03,704][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000010432_2670592.pth [2023-10-05 17:21:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 6938624. Throughput: 0: 819.3, 1: 819.9. Samples: 1734003. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:08,652][23454] Avg episode reward: [(0, '9.820'), (1, '9.830')] [2023-10-05 17:21:12,456][24460] Updated weights for policy 1, policy_version 13600 (0.0017) [2023-10-05 17:21:12,456][24456] Updated weights for policy 0, policy_version 13600 (0.0016) [2023-10-05 17:21:13,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 6963200. Throughput: 0: 819.2, 1: 819.2. Samples: 1738752. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:13,652][23454] Avg episode reward: [(0, '9.820'), (1, '9.830')] [2023-10-05 17:21:18,651][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 6995968. Throughput: 0: 817.2, 1: 816.8. Samples: 1748686. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:18,652][23454] Avg episode reward: [(0, '9.800'), (1, '9.820')] [2023-10-05 17:21:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 7028736. Throughput: 0: 818.2, 1: 818.0. Samples: 1758446. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:21:23,652][23454] Avg episode reward: [(0, '9.800'), (1, '9.830')] [2023-10-05 17:21:24,941][24456] Updated weights for policy 0, policy_version 13760 (0.0019) [2023-10-05 17:21:24,942][24460] Updated weights for policy 1, policy_version 13760 (0.0019) [2023-10-05 17:21:28,651][23454] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6511.9). Total num frames: 7065600. Throughput: 0: 819.2, 1: 819.2. Samples: 1763328. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:21:28,652][23454] Avg episode reward: [(0, '9.790'), (1, '9.830')] [2023-10-05 17:21:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7094272. Throughput: 0: 817.3, 1: 818.8. Samples: 1773284. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:21:33,652][23454] Avg episode reward: [(0, '9.800'), (1, '9.830')] [2023-10-05 17:21:37,526][24456] Updated weights for policy 0, policy_version 13920 (0.0015) [2023-10-05 17:21:37,526][24460] Updated weights for policy 1, policy_version 13920 (0.0015) [2023-10-05 17:21:38,651][23454] Fps is (10 sec: 6144.1, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7127040. Throughput: 0: 816.7, 1: 816.8. Samples: 1782843. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:21:38,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.840')] [2023-10-05 17:21:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7159808. Throughput: 0: 819.2, 1: 819.2. Samples: 1787904. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:43,652][23454] Avg episode reward: [(0, '9.800'), (1, '9.840')] [2023-10-05 17:21:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7192576. Throughput: 0: 816.8, 1: 816.0. Samples: 1797759. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:48,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.840')] [2023-10-05 17:21:50,090][24460] Updated weights for policy 1, policy_version 14080 (0.0018) [2023-10-05 17:21:50,090][24456] Updated weights for policy 0, policy_version 14080 (0.0018) [2023-10-05 17:21:53,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7225344. Throughput: 0: 813.9, 1: 814.0. Samples: 1807257. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:53,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.840')] [2023-10-05 17:21:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7258112. Throughput: 0: 818.4, 1: 819.2. Samples: 1812444. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:21:58,653][23454] Avg episode reward: [(0, '9.810'), (1, '9.850')] [2023-10-05 17:22:02,596][24460] Updated weights for policy 1, policy_version 14240 (0.0018) [2023-10-05 17:22:02,596][24456] Updated weights for policy 0, policy_version 14240 (0.0016) [2023-10-05 17:22:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7290880. Throughput: 0: 815.9, 1: 815.6. Samples: 1822106. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:22:03,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.850')] [2023-10-05 17:22:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7323648. Throughput: 0: 814.4, 1: 814.7. Samples: 1831754. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:22:08,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.850')] [2023-10-05 17:22:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7356416. Throughput: 0: 815.5, 1: 816.4. Samples: 1836764. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:22:13,653][23454] Avg episode reward: [(0, '9.810'), (1, '9.850')] [2023-10-05 17:22:15,431][24456] Updated weights for policy 0, policy_version 14400 (0.0018) [2023-10-05 17:22:15,432][24460] Updated weights for policy 1, policy_version 14400 (0.0017) [2023-10-05 17:22:18,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7389184. Throughput: 0: 809.3, 1: 807.6. Samples: 1846044. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:22:18,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.860')] [2023-10-05 17:22:23,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7421952. Throughput: 0: 809.3, 1: 809.2. Samples: 1855677. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:22:23,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.860')] [2023-10-05 17:22:27,935][24460] Updated weights for policy 1, policy_version 14560 (0.0017) [2023-10-05 17:22:27,935][24456] Updated weights for policy 0, policy_version 14560 (0.0017) [2023-10-05 17:22:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 7454720. Throughput: 0: 810.2, 1: 810.6. Samples: 1860841. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:22:28,652][23454] Avg episode reward: [(0, '9.840'), (1, '9.850')] [2023-10-05 17:22:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7487488. Throughput: 0: 807.8, 1: 808.8. Samples: 1870507. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:22:33,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.850')] [2023-10-05 17:22:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7520256. Throughput: 0: 809.8, 1: 809.7. Samples: 1880133. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:22:38,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.860')] [2023-10-05 17:22:40,591][24456] Updated weights for policy 0, policy_version 14720 (0.0017) [2023-10-05 17:22:40,591][24460] Updated weights for policy 1, policy_version 14720 (0.0017) [2023-10-05 17:22:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7553024. Throughput: 0: 807.5, 1: 807.1. Samples: 1885099. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:22:43,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.860')] [2023-10-05 17:22:48,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7585792. Throughput: 0: 806.7, 1: 807.2. Samples: 1894728. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:22:48,653][23454] Avg episode reward: [(0, '9.850'), (1, '9.860')] [2023-10-05 17:22:53,253][24460] Updated weights for policy 1, policy_version 14880 (0.0017) [2023-10-05 17:22:53,253][24456] Updated weights for policy 0, policy_version 14880 (0.0017) [2023-10-05 17:22:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 7618560. Throughput: 0: 810.2, 1: 809.5. Samples: 1904640. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:22:53,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.860')] [2023-10-05 17:22:58,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6511.9). Total num frames: 7651328. Throughput: 0: 807.4, 1: 806.9. Samples: 1909407. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:22:58,653][23454] Avg episode reward: [(0, '9.850'), (1, '9.870')] [2023-10-05 17:23:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 7684096. Throughput: 0: 811.6, 1: 812.0. Samples: 1919104. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:23:03,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.870')] [2023-10-05 17:23:03,663][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000015008_3842048.pth... [2023-10-05 17:23:03,663][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000015008_3842048.pth... [2023-10-05 17:23:03,694][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000011952_3059712.pth [2023-10-05 17:23:03,696][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000011952_3059712.pth [2023-10-05 17:23:05,787][24456] Updated weights for policy 0, policy_version 15040 (0.0018) [2023-10-05 17:23:05,787][24460] Updated weights for policy 1, policy_version 15040 (0.0018) [2023-10-05 17:23:08,652][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 7716864. Throughput: 0: 816.3, 1: 816.4. Samples: 1929150. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:23:08,652][23454] Avg episode reward: [(0, '9.840'), (1, '9.870')] [2023-10-05 17:23:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 7749632. Throughput: 0: 810.6, 1: 810.9. Samples: 1933808. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:23:13,652][23454] Avg episode reward: [(0, '9.840'), (1, '9.870')] [2023-10-05 17:23:18,472][24460] Updated weights for policy 1, policy_version 15200 (0.0017) [2023-10-05 17:23:18,472][24456] Updated weights for policy 0, policy_version 15200 (0.0017) [2023-10-05 17:23:18,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 7782400. Throughput: 0: 812.0, 1: 811.2. Samples: 1943552. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:23:18,652][23454] Avg episode reward: [(0, '9.840'), (1, '9.870')] [2023-10-05 17:23:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 7815168. Throughput: 0: 814.0, 1: 814.3. Samples: 1953403. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:23:23,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.860')] [2023-10-05 17:23:28,651][23454] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6511.9). Total num frames: 7843840. Throughput: 0: 809.0, 1: 808.6. Samples: 1957889. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:23:28,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.860')] [2023-10-05 17:23:31,148][24456] Updated weights for policy 0, policy_version 15360 (0.0017) [2023-10-05 17:23:31,148][24460] Updated weights for policy 1, policy_version 15360 (0.0019) [2023-10-05 17:23:33,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 7872512. Throughput: 0: 812.2, 1: 811.1. Samples: 1967777. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:23:33,653][23454] Avg episode reward: [(0, '9.830'), (1, '9.860')] [2023-10-05 17:23:38,652][23454] Fps is (10 sec: 6144.0, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 7905280. Throughput: 0: 808.2, 1: 809.6. Samples: 1977442. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:23:38,653][23454] Avg episode reward: [(0, '9.830'), (1, '9.860')] [2023-10-05 17:23:43,652][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 7938048. Throughput: 0: 811.8, 1: 811.1. Samples: 1982434. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:23:43,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.860')] [2023-10-05 17:23:43,851][24460] Updated weights for policy 1, policy_version 15520 (0.0016) [2023-10-05 17:23:43,851][24456] Updated weights for policy 0, policy_version 15520 (0.0019) [2023-10-05 17:23:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 7970816. Throughput: 0: 812.3, 1: 812.2. Samples: 1992205. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:23:48,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.860')] [2023-10-05 17:23:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8003584. Throughput: 0: 804.6, 1: 804.7. Samples: 2001570. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:23:53,652][23454] Avg episode reward: [(0, '9.840'), (1, '9.860')] [2023-10-05 17:23:56,586][24456] Updated weights for policy 0, policy_version 15680 (0.0016) [2023-10-05 17:23:56,587][24460] Updated weights for policy 1, policy_version 15680 (0.0017) [2023-10-05 17:23:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8036352. Throughput: 0: 808.8, 1: 807.4. Samples: 2006534. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:23:58,652][23454] Avg episode reward: [(0, '9.840'), (1, '9.860')] [2023-10-05 17:24:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8069120. Throughput: 0: 805.0, 1: 805.3. Samples: 2016013. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:24:03,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.870')] [2023-10-05 17:24:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8101888. Throughput: 0: 804.0, 1: 803.9. Samples: 2025760. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:24:08,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.870')] [2023-10-05 17:24:09,179][24460] Updated weights for policy 1, policy_version 15840 (0.0016) [2023-10-05 17:24:09,179][24456] Updated weights for policy 0, policy_version 15840 (0.0016) [2023-10-05 17:24:13,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8134656. Throughput: 0: 810.7, 1: 811.0. Samples: 2030865. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:24:13,652][23454] Avg episode reward: [(0, '9.830'), (1, '9.830')] [2023-10-05 17:24:18,651][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 8167424. Throughput: 0: 809.4, 1: 810.3. Samples: 2040666. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:24:18,652][23454] Avg episode reward: [(0, '9.820'), (1, '9.830')] [2023-10-05 17:24:21,785][24456] Updated weights for policy 0, policy_version 16000 (0.0016) [2023-10-05 17:24:21,785][24460] Updated weights for policy 1, policy_version 16000 (0.0015) [2023-10-05 17:24:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8200192. Throughput: 0: 808.2, 1: 807.4. Samples: 2050145. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:24:23,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.840')] [2023-10-05 17:24:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 8232960. Throughput: 0: 809.6, 1: 810.3. Samples: 2055328. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:24:28,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.840')] [2023-10-05 17:24:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8265728. Throughput: 0: 808.4, 1: 808.4. Samples: 2064962. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:24:33,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.840')] [2023-10-05 17:24:34,269][24456] Updated weights for policy 0, policy_version 16160 (0.0019) [2023-10-05 17:24:34,269][24460] Updated weights for policy 1, policy_version 16160 (0.0019) [2023-10-05 17:24:38,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8298496. Throughput: 0: 813.0, 1: 812.7. Samples: 2074726. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:24:38,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.840')] [2023-10-05 17:24:43,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8331264. Throughput: 0: 813.2, 1: 814.3. Samples: 2079771. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:24:43,653][23454] Avg episode reward: [(0, '9.810'), (1, '9.840')] [2023-10-05 17:24:46,810][24460] Updated weights for policy 1, policy_version 16320 (0.0016) [2023-10-05 17:24:46,810][24456] Updated weights for policy 0, policy_version 16320 (0.0017) [2023-10-05 17:24:48,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8364032. Throughput: 0: 816.2, 1: 816.4. Samples: 2089481. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:24:48,653][23454] Avg episode reward: [(0, '9.810'), (1, '9.830')] [2023-10-05 17:24:53,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8396800. Throughput: 0: 816.3, 1: 815.7. Samples: 2099200. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:24:53,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.830')] [2023-10-05 17:24:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8429568. Throughput: 0: 812.8, 1: 813.9. Samples: 2104068. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:24:58,652][23454] Avg episode reward: [(0, '9.820'), (1, '9.850')] [2023-10-05 17:24:59,671][24460] Updated weights for policy 1, policy_version 16480 (0.0016) [2023-10-05 17:24:59,672][24456] Updated weights for policy 0, policy_version 16480 (0.0018) [2023-10-05 17:25:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8462336. Throughput: 0: 809.9, 1: 809.5. Samples: 2113536. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:25:03,652][23454] Avg episode reward: [(0, '9.820'), (1, '9.850')] [2023-10-05 17:25:03,661][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000016528_4231168.pth... [2023-10-05 17:25:03,661][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000016528_4231168.pth... [2023-10-05 17:25:03,690][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000013488_3452928.pth [2023-10-05 17:25:03,700][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000013488_3452928.pth [2023-10-05 17:25:08,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8495104. Throughput: 0: 813.7, 1: 814.0. Samples: 2123393. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:25:08,653][23454] Avg episode reward: [(0, '9.820'), (1, '9.860')] [2023-10-05 17:25:12,194][24460] Updated weights for policy 1, policy_version 16640 (0.0017) [2023-10-05 17:25:12,195][24456] Updated weights for policy 0, policy_version 16640 (0.0017) [2023-10-05 17:25:13,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8527872. Throughput: 0: 808.3, 1: 808.5. Samples: 2128085. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:25:13,653][23454] Avg episode reward: [(0, '9.810'), (1, '9.860')] [2023-10-05 17:25:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6511.9). Total num frames: 8560640. Throughput: 0: 813.0, 1: 812.5. Samples: 2138112. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:25:18,652][23454] Avg episode reward: [(0, '9.810'), (1, '9.870')] [2023-10-05 17:25:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8593408. Throughput: 0: 812.3, 1: 812.4. Samples: 2147839. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:25:23,652][23454] Avg episode reward: [(0, '9.820'), (1, '9.870')] [2023-10-05 17:25:24,797][24456] Updated weights for policy 0, policy_version 16800 (0.0017) [2023-10-05 17:25:24,797][24460] Updated weights for policy 1, policy_version 16800 (0.0015) [2023-10-05 17:25:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8626176. Throughput: 0: 808.2, 1: 808.5. Samples: 2152518. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:25:28,652][23454] Avg episode reward: [(0, '9.840'), (1, '9.870')] [2023-10-05 17:25:33,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8658944. Throughput: 0: 812.5, 1: 813.2. Samples: 2162635. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:25:33,653][23454] Avg episode reward: [(0, '9.840'), (1, '9.870')] [2023-10-05 17:25:37,316][24456] Updated weights for policy 0, policy_version 16960 (0.0016) [2023-10-05 17:25:37,316][24460] Updated weights for policy 1, policy_version 16960 (0.0016) [2023-10-05 17:25:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8691712. Throughput: 0: 812.9, 1: 813.6. Samples: 2172391. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:25:38,652][23454] Avg episode reward: [(0, '9.850'), (1, '9.870')] [2023-10-05 17:25:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 8724480. Throughput: 0: 811.3, 1: 810.0. Samples: 2177025. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:25:43,653][23454] Avg episode reward: [(0, '9.860'), (1, '9.880')] [2023-10-05 17:25:43,654][24178] Saving new best policy, reward=9.880! [2023-10-05 17:25:43,654][24064] Saving new best policy, reward=9.860! [2023-10-05 17:25:48,652][23454] Fps is (10 sec: 6143.7, 60 sec: 6485.3, 300 sec: 6511.9). Total num frames: 8753152. Throughput: 0: 816.7, 1: 817.4. Samples: 2187073. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:25:48,653][23454] Avg episode reward: [(0, '9.870'), (1, '9.880')] [2023-10-05 17:25:48,665][24064] Saving new best policy, reward=9.870! [2023-10-05 17:25:49,974][24460] Updated weights for policy 1, policy_version 17120 (0.0017) [2023-10-05 17:25:49,975][24456] Updated weights for policy 0, policy_version 17120 (0.0013) [2023-10-05 17:25:53,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 8781824. Throughput: 0: 814.0, 1: 813.7. Samples: 2196640. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:25:53,652][23454] Avg episode reward: [(0, '9.870'), (1, '9.880')] [2023-10-05 17:25:58,652][23454] Fps is (10 sec: 6144.1, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 8814592. Throughput: 0: 817.1, 1: 816.6. Samples: 2201600. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:25:58,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.880')] [2023-10-05 17:25:58,721][24064] Saving new best policy, reward=9.890! [2023-10-05 17:26:02,504][24456] Updated weights for policy 0, policy_version 17280 (0.0015) [2023-10-05 17:26:02,505][24460] Updated weights for policy 1, policy_version 17280 (0.0017) [2023-10-05 17:26:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 8847360. Throughput: 0: 814.5, 1: 814.6. Samples: 2211421. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:26:03,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.860')] [2023-10-05 17:26:03,727][24064] Saving new best policy, reward=9.900! [2023-10-05 17:26:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8880128. Throughput: 0: 815.4, 1: 814.1. Samples: 2221168. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:26:08,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.860')] [2023-10-05 17:26:13,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 8912896. Throughput: 0: 816.6, 1: 816.4. Samples: 2226003. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:26:13,653][23454] Avg episode reward: [(0, '9.900'), (1, '9.850')] [2023-10-05 17:26:15,133][24460] Updated weights for policy 1, policy_version 17440 (0.0019) [2023-10-05 17:26:15,133][24456] Updated weights for policy 0, policy_version 17440 (0.0018) [2023-10-05 17:26:18,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 8945664. Throughput: 0: 813.0, 1: 812.0. Samples: 2235758. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:26:18,653][23454] Avg episode reward: [(0, '9.910'), (1, '9.850')] [2023-10-05 17:26:18,665][24064] Saving new best policy, reward=9.910! [2023-10-05 17:26:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6484.2). Total num frames: 8978432. Throughput: 0: 810.6, 1: 810.6. Samples: 2245347. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:26:23,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.850')] [2023-10-05 17:26:23,653][24064] Saving new best policy, reward=9.920! [2023-10-05 17:26:27,631][24460] Updated weights for policy 1, policy_version 17600 (0.0016) [2023-10-05 17:26:27,632][24456] Updated weights for policy 0, policy_version 17600 (0.0019) [2023-10-05 17:26:28,651][23454] Fps is (10 sec: 6553.9, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9011200. Throughput: 0: 816.4, 1: 816.6. Samples: 2250508. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:26:28,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.850')] [2023-10-05 17:26:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9043968. Throughput: 0: 811.6, 1: 811.5. Samples: 2260112. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:26:33,653][23454] Avg episode reward: [(0, '9.920'), (1, '9.850')] [2023-10-05 17:26:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9076736. Throughput: 0: 814.1, 1: 813.9. Samples: 2269900. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:26:38,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.850')] [2023-10-05 17:26:40,255][24460] Updated weights for policy 1, policy_version 17760 (0.0018) [2023-10-05 17:26:40,256][24456] Updated weights for policy 0, policy_version 17760 (0.0016) [2023-10-05 17:26:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9109504. Throughput: 0: 814.5, 1: 815.0. Samples: 2274926. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:26:43,653][23454] Avg episode reward: [(0, '9.920'), (1, '9.820')] [2023-10-05 17:26:48,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 9142272. Throughput: 0: 813.8, 1: 814.5. Samples: 2284698. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:26:48,653][23454] Avg episode reward: [(0, '9.920'), (1, '9.820')] [2023-10-05 17:26:52,925][24456] Updated weights for policy 0, policy_version 17920 (0.0017) [2023-10-05 17:26:52,926][24460] Updated weights for policy 1, policy_version 17920 (0.0018) [2023-10-05 17:26:53,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9175040. Throughput: 0: 809.1, 1: 810.2. Samples: 2294038. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:26:53,653][23454] Avg episode reward: [(0, '9.920'), (1, '9.810')] [2023-10-05 17:26:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9207808. Throughput: 0: 812.5, 1: 812.7. Samples: 2299140. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:26:58,652][23454] Avg episode reward: [(0, '9.930'), (1, '9.820')] [2023-10-05 17:26:58,653][24064] Saving new best policy, reward=9.930! [2023-10-05 17:27:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9240576. Throughput: 0: 813.0, 1: 813.0. Samples: 2308932. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:27:03,653][23454] Avg episode reward: [(0, '9.930'), (1, '9.810')] [2023-10-05 17:27:03,667][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000018048_4620288.pth... [2023-10-05 17:27:03,667][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000018048_4620288.pth... [2023-10-05 17:27:03,703][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000015008_3842048.pth [2023-10-05 17:27:03,703][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000015008_3842048.pth [2023-10-05 17:27:05,501][24460] Updated weights for policy 1, policy_version 18080 (0.0017) [2023-10-05 17:27:05,501][24456] Updated weights for policy 0, policy_version 18080 (0.0016) [2023-10-05 17:27:08,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9273344. Throughput: 0: 811.5, 1: 811.4. Samples: 2318377. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:27:08,653][23454] Avg episode reward: [(0, '9.930'), (1, '9.810')] [2023-10-05 17:27:13,651][23454] Fps is (10 sec: 6553.9, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9306112. Throughput: 0: 810.6, 1: 810.8. Samples: 2323469. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:27:13,652][23454] Avg episode reward: [(0, '9.930'), (1, '9.810')] [2023-10-05 17:27:18,115][24460] Updated weights for policy 1, policy_version 18240 (0.0019) [2023-10-05 17:27:18,116][24456] Updated weights for policy 0, policy_version 18240 (0.0018) [2023-10-05 17:27:18,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9338880. Throughput: 0: 810.6, 1: 810.7. Samples: 2333070. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:27:18,653][23454] Avg episode reward: [(0, '9.930'), (1, '9.810')] [2023-10-05 17:27:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9371648. Throughput: 0: 811.4, 1: 811.1. Samples: 2342913. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:27:23,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.800')] [2023-10-05 17:27:28,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9404416. Throughput: 0: 810.5, 1: 810.4. Samples: 2347866. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:27:28,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.790')] [2023-10-05 17:27:30,667][24456] Updated weights for policy 0, policy_version 18400 (0.0017) [2023-10-05 17:27:30,667][24460] Updated weights for policy 1, policy_version 18400 (0.0017) [2023-10-05 17:27:33,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9437184. Throughput: 0: 809.5, 1: 809.2. Samples: 2357540. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:27:33,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.780')] [2023-10-05 17:27:38,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9469952. Throughput: 0: 815.8, 1: 815.5. Samples: 2367449. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:27:38,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.780')] [2023-10-05 17:27:43,289][24456] Updated weights for policy 0, policy_version 18560 (0.0018) [2023-10-05 17:27:43,289][24460] Updated weights for policy 1, policy_version 18560 (0.0016) [2023-10-05 17:27:43,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9502720. Throughput: 0: 810.3, 1: 809.7. Samples: 2372039. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:27:43,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.780')] [2023-10-05 17:27:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9535488. Throughput: 0: 810.0, 1: 809.9. Samples: 2381826. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:27:48,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.790')] [2023-10-05 17:27:53,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9568256. Throughput: 0: 817.7, 1: 816.7. Samples: 2391923. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:27:53,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.790')] [2023-10-05 17:27:55,893][24460] Updated weights for policy 1, policy_version 18720 (0.0016) [2023-10-05 17:27:55,893][24456] Updated weights for policy 0, policy_version 18720 (0.0018) [2023-10-05 17:27:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9601024. Throughput: 0: 811.6, 1: 811.6. Samples: 2396513. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:27:58,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.800')] [2023-10-05 17:28:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 9633792. Throughput: 0: 815.1, 1: 814.5. Samples: 2406400. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:28:03,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.800')] [2023-10-05 17:28:08,652][23454] Fps is (10 sec: 6143.8, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 9662464. Throughput: 0: 811.7, 1: 814.0. Samples: 2416070. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:28:08,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.810')] [2023-10-05 17:28:08,684][24456] Updated weights for policy 0, policy_version 18880 (0.0017) [2023-10-05 17:28:08,684][24460] Updated weights for policy 1, policy_version 18880 (0.0016) [2023-10-05 17:28:13,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 9691136. Throughput: 0: 809.9, 1: 809.4. Samples: 2420736. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:28:13,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.810')] [2023-10-05 17:28:18,652][23454] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 9723904. Throughput: 0: 805.1, 1: 805.5. Samples: 2430014. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:28:18,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.810')] [2023-10-05 17:28:21,594][24456] Updated weights for policy 0, policy_version 19040 (0.0018) [2023-10-05 17:28:21,594][24460] Updated weights for policy 1, policy_version 19040 (0.0017) [2023-10-05 17:28:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6484.2). Total num frames: 9756672. Throughput: 0: 802.0, 1: 802.5. Samples: 2439648. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:28:23,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.810')] [2023-10-05 17:28:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 9789440. Throughput: 0: 807.4, 1: 808.0. Samples: 2444730. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:28:28,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.810')] [2023-10-05 17:28:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9822208. Throughput: 0: 806.5, 1: 806.7. Samples: 2454418. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:28:33,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.810')] [2023-10-05 17:28:34,191][24456] Updated weights for policy 0, policy_version 19200 (0.0017) [2023-10-05 17:28:34,191][24460] Updated weights for policy 1, policy_version 19200 (0.0018) [2023-10-05 17:28:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9854976. Throughput: 0: 799.3, 1: 800.4. Samples: 2463910. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:28:38,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.820')] [2023-10-05 17:28:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 9887744. Throughput: 0: 803.5, 1: 803.7. Samples: 2468837. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:28:43,653][23454] Avg episode reward: [(0, '9.900'), (1, '9.850')] [2023-10-05 17:28:46,767][24456] Updated weights for policy 0, policy_version 19360 (0.0018) [2023-10-05 17:28:46,767][24460] Updated weights for policy 1, policy_version 19360 (0.0017) [2023-10-05 17:28:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9920512. Throughput: 0: 802.6, 1: 803.4. Samples: 2478668. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:28:48,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.850')] [2023-10-05 17:28:53,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 9953280. Throughput: 0: 804.0, 1: 801.6. Samples: 2488321. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:28:53,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.850')] [2023-10-05 17:28:58,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 9986048. Throughput: 0: 807.2, 1: 807.7. Samples: 2493404. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:28:58,653][23454] Avg episode reward: [(0, '9.910'), (1, '9.850')] [2023-10-05 17:28:59,356][24456] Updated weights for policy 0, policy_version 19520 (0.0017) [2023-10-05 17:28:59,356][24460] Updated weights for policy 1, policy_version 19520 (0.0018) [2023-10-05 17:29:03,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 10018816. Throughput: 0: 810.4, 1: 810.1. Samples: 2502939. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:29:03,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.850')] [2023-10-05 17:29:03,661][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000019568_5009408.pth... [2023-10-05 17:29:03,662][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000019568_5009408.pth... [2023-10-05 17:29:03,699][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000016528_4231168.pth [2023-10-05 17:29:03,700][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000016528_4231168.pth [2023-10-05 17:29:08,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 10051584. Throughput: 0: 814.1, 1: 813.7. Samples: 2512900. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-10-05 17:29:08,653][23454] Avg episode reward: [(0, '9.900'), (1, '9.840')] [2023-10-05 17:29:11,862][24456] Updated weights for policy 0, policy_version 19680 (0.0018) [2023-10-05 17:29:11,862][24460] Updated weights for policy 1, policy_version 19680 (0.0018) [2023-10-05 17:29:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10084352. Throughput: 0: 812.7, 1: 812.6. Samples: 2517868. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:13,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.840')] [2023-10-05 17:29:18,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10117120. Throughput: 0: 812.9, 1: 813.4. Samples: 2527598. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:18,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.850')] [2023-10-05 17:29:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10149888. Throughput: 0: 817.6, 1: 817.1. Samples: 2537472. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:23,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.850')] [2023-10-05 17:29:24,501][24460] Updated weights for policy 1, policy_version 19840 (0.0018) [2023-10-05 17:29:24,501][24456] Updated weights for policy 0, policy_version 19840 (0.0018) [2023-10-05 17:29:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10182656. Throughput: 0: 815.5, 1: 815.3. Samples: 2542226. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:29:28,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.850')] [2023-10-05 17:29:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10215424. Throughput: 0: 813.5, 1: 813.2. Samples: 2551873. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:29:33,653][23454] Avg episode reward: [(0, '9.900'), (1, '9.860')] [2023-10-05 17:29:37,079][24460] Updated weights for policy 1, policy_version 20000 (0.0016) [2023-10-05 17:29:37,079][24456] Updated weights for policy 0, policy_version 20000 (0.0017) [2023-10-05 17:29:38,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10248192. Throughput: 0: 818.6, 1: 818.8. Samples: 2562004. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:38,653][23454] Avg episode reward: [(0, '9.900'), (1, '9.860')] [2023-10-05 17:29:43,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10280960. Throughput: 0: 813.7, 1: 813.4. Samples: 2566626. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:43,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.860')] [2023-10-05 17:29:48,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10313728. Throughput: 0: 816.3, 1: 815.8. Samples: 2576384. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:48,653][23454] Avg episode reward: [(0, '9.900'), (1, '9.860')] [2023-10-05 17:29:49,750][24460] Updated weights for policy 1, policy_version 20160 (0.0017) [2023-10-05 17:29:49,751][24456] Updated weights for policy 0, policy_version 20160 (0.0019) [2023-10-05 17:29:53,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10346496. Throughput: 0: 812.3, 1: 812.7. Samples: 2586023. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:53,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.860')] [2023-10-05 17:29:58,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10379264. Throughput: 0: 809.7, 1: 809.2. Samples: 2590721. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:29:58,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.860')] [2023-10-05 17:30:02,388][24456] Updated weights for policy 0, policy_version 20320 (0.0016) [2023-10-05 17:30:02,389][24460] Updated weights for policy 1, policy_version 20320 (0.0017) [2023-10-05 17:30:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10412032. Throughput: 0: 813.7, 1: 812.9. Samples: 2600794. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:30:03,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.850')] [2023-10-05 17:30:08,651][23454] Fps is (10 sec: 6144.1, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 10440704. Throughput: 0: 811.6, 1: 811.8. Samples: 2610526. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:30:08,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.840')] [2023-10-05 17:30:13,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 10469376. Throughput: 0: 812.1, 1: 811.7. Samples: 2615296. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:30:13,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.840')] [2023-10-05 17:30:14,969][24456] Updated weights for policy 0, policy_version 20480 (0.0018) [2023-10-05 17:30:14,969][24460] Updated weights for policy 1, policy_version 20480 (0.0018) [2023-10-05 17:30:18,652][23454] Fps is (10 sec: 6143.9, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 10502144. Throughput: 0: 815.0, 1: 814.5. Samples: 2625200. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:30:18,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.840')] [2023-10-05 17:30:23,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 10534912. Throughput: 0: 810.3, 1: 810.0. Samples: 2634920. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:30:23,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.840')] [2023-10-05 17:30:27,473][24460] Updated weights for policy 1, policy_version 20640 (0.0017) [2023-10-05 17:30:27,473][24456] Updated weights for policy 0, policy_version 20640 (0.0018) [2023-10-05 17:30:28,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 10567680. Throughput: 0: 814.0, 1: 813.7. Samples: 2639872. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:30:28,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.860')] [2023-10-05 17:30:33,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 10600448. Throughput: 0: 814.9, 1: 815.4. Samples: 2649748. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:30:33,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.860')] [2023-10-05 17:30:38,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 10633216. Throughput: 0: 813.5, 1: 813.9. Samples: 2659254. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:30:38,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.860')] [2023-10-05 17:30:40,207][24460] Updated weights for policy 1, policy_version 20800 (0.0017) [2023-10-05 17:30:40,207][24456] Updated weights for policy 0, policy_version 20800 (0.0017) [2023-10-05 17:30:43,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6484.2). Total num frames: 10665984. Throughput: 0: 815.6, 1: 816.0. Samples: 2664146. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:30:43,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.870')] [2023-10-05 17:30:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 10698752. Throughput: 0: 811.0, 1: 811.6. Samples: 2673808. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:30:48,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.870')] [2023-10-05 17:30:52,736][24460] Updated weights for policy 1, policy_version 20960 (0.0016) [2023-10-05 17:30:52,737][24456] Updated weights for policy 0, policy_version 20960 (0.0017) [2023-10-05 17:30:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 10731520. Throughput: 0: 811.3, 1: 811.5. Samples: 2683552. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:30:53,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.870')] [2023-10-05 17:30:58,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 10764288. Throughput: 0: 815.3, 1: 814.4. Samples: 2688631. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:30:58,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.870')] [2023-10-05 17:31:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 10797056. Throughput: 0: 811.3, 1: 811.7. Samples: 2698237. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:31:03,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.860')] [2023-10-05 17:31:03,660][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000021088_5398528.pth... [2023-10-05 17:31:03,660][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000021088_5398528.pth... [2023-10-05 17:31:03,696][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000018048_4620288.pth [2023-10-05 17:31:03,697][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000018048_4620288.pth [2023-10-05 17:31:05,334][24456] Updated weights for policy 0, policy_version 21120 (0.0018) [2023-10-05 17:31:05,335][24460] Updated weights for policy 1, policy_version 21120 (0.0018) [2023-10-05 17:31:08,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6485.4, 300 sec: 6498.1). Total num frames: 10829824. Throughput: 0: 810.6, 1: 811.1. Samples: 2707896. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:31:08,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.860')] [2023-10-05 17:31:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10862592. Throughput: 0: 811.1, 1: 811.8. Samples: 2712900. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:31:13,652][23454] Avg episode reward: [(0, '9.880'), (1, '9.890')] [2023-10-05 17:31:13,653][24178] Saving new best policy, reward=9.890! [2023-10-05 17:31:17,942][24456] Updated weights for policy 0, policy_version 21280 (0.0016) [2023-10-05 17:31:17,942][24460] Updated weights for policy 1, policy_version 21280 (0.0015) [2023-10-05 17:31:18,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10895360. Throughput: 0: 809.0, 1: 809.0. Samples: 2722559. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:31:18,653][23454] Avg episode reward: [(0, '9.880'), (1, '9.900')] [2023-10-05 17:31:18,665][24178] Saving new best policy, reward=9.900! [2023-10-05 17:31:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10928128. Throughput: 0: 809.1, 1: 808.2. Samples: 2732034. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:31:23,652][23454] Avg episode reward: [(0, '9.870'), (1, '9.900')] [2023-10-05 17:31:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10960896. Throughput: 0: 808.8, 1: 808.9. Samples: 2736942. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:31:28,652][23454] Avg episode reward: [(0, '9.860'), (1, '9.910')] [2023-10-05 17:31:28,653][24178] Saving new best policy, reward=9.910! [2023-10-05 17:31:30,726][24456] Updated weights for policy 0, policy_version 21440 (0.0017) [2023-10-05 17:31:30,727][24460] Updated weights for policy 1, policy_version 21440 (0.0016) [2023-10-05 17:31:33,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 10993664. Throughput: 0: 808.4, 1: 808.5. Samples: 2746570. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:31:33,652][23454] Avg episode reward: [(0, '9.860'), (1, '9.910')] [2023-10-05 17:31:38,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11026432. Throughput: 0: 812.0, 1: 810.9. Samples: 2756579. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:31:38,652][23454] Avg episode reward: [(0, '9.860'), (1, '9.910')] [2023-10-05 17:31:43,633][24456] Updated weights for policy 0, policy_version 21600 (0.0017) [2023-10-05 17:31:43,634][24460] Updated weights for policy 1, policy_version 21600 (0.0018) [2023-10-05 17:31:43,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11059200. Throughput: 0: 802.6, 1: 803.8. Samples: 2760920. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:31:43,653][23454] Avg episode reward: [(0, '9.860'), (1, '9.910')] [2023-10-05 17:31:48,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 11083776. Throughput: 0: 803.9, 1: 804.7. Samples: 2770628. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:31:48,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.910')] [2023-10-05 17:31:53,652][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11116544. Throughput: 0: 803.5, 1: 803.7. Samples: 2780219. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:31:53,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.930')] [2023-10-05 17:31:53,751][24178] Saving new best policy, reward=9.930! [2023-10-05 17:31:56,322][24460] Updated weights for policy 1, policy_version 21760 (0.0017) [2023-10-05 17:31:56,324][24456] Updated weights for policy 0, policy_version 21760 (0.0018) [2023-10-05 17:31:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11149312. Throughput: 0: 803.7, 1: 803.5. Samples: 2785225. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:31:58,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.930')] [2023-10-05 17:32:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 11182080. Throughput: 0: 805.0, 1: 805.0. Samples: 2795007. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:32:03,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.940')] [2023-10-05 17:32:03,768][24178] Saving new best policy, reward=9.940! [2023-10-05 17:32:08,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 11214848. Throughput: 0: 807.7, 1: 808.1. Samples: 2804747. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:32:08,653][23454] Avg episode reward: [(0, '9.910'), (1, '9.940')] [2023-10-05 17:32:08,786][24460] Updated weights for policy 1, policy_version 21920 (0.0018) [2023-10-05 17:32:08,786][24456] Updated weights for policy 0, policy_version 21920 (0.0017) [2023-10-05 17:32:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11247616. Throughput: 0: 809.2, 1: 808.7. Samples: 2809749. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:32:13,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.940')] [2023-10-05 17:32:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11280384. Throughput: 0: 810.1, 1: 810.2. Samples: 2819482. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:32:18,653][23454] Avg episode reward: [(0, '9.920'), (1, '9.940')] [2023-10-05 17:32:21,454][24456] Updated weights for policy 0, policy_version 22080 (0.0018) [2023-10-05 17:32:21,455][24460] Updated weights for policy 1, policy_version 22080 (0.0017) [2023-10-05 17:32:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11313152. Throughput: 0: 804.0, 1: 805.2. Samples: 2828993. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:32:23,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.940')] [2023-10-05 17:32:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11345920. Throughput: 0: 813.7, 1: 814.0. Samples: 2834168. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:32:28,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.940')] [2023-10-05 17:32:33,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11378688. Throughput: 0: 813.7, 1: 813.1. Samples: 2843834. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:32:33,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.940')] [2023-10-05 17:32:34,022][24456] Updated weights for policy 0, policy_version 22240 (0.0017) [2023-10-05 17:32:34,022][24460] Updated weights for policy 1, policy_version 22240 (0.0018) [2023-10-05 17:32:38,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11411456. Throughput: 0: 813.9, 1: 813.5. Samples: 2853452. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:32:38,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.940')] [2023-10-05 17:32:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11444224. Throughput: 0: 813.5, 1: 813.4. Samples: 2858438. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:32:43,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.930')] [2023-10-05 17:32:46,792][24456] Updated weights for policy 0, policy_version 22400 (0.0016) [2023-10-05 17:32:46,792][24460] Updated weights for policy 1, policy_version 22400 (0.0018) [2023-10-05 17:32:48,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 11476992. Throughput: 0: 807.9, 1: 808.2. Samples: 2867732. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:32:48,653][23454] Avg episode reward: [(0, '9.910'), (1, '9.930')] [2023-10-05 17:32:53,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 11509760. Throughput: 0: 807.9, 1: 807.5. Samples: 2877440. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:32:53,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.930')] [2023-10-05 17:32:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 11542528. Throughput: 0: 807.0, 1: 807.5. Samples: 2882399. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:32:58,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.930')] [2023-10-05 17:32:59,447][24460] Updated weights for policy 1, policy_version 22560 (0.0018) [2023-10-05 17:32:59,447][24456] Updated weights for policy 0, policy_version 22560 (0.0018) [2023-10-05 17:33:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6484.2). Total num frames: 11575296. Throughput: 0: 805.9, 1: 805.7. Samples: 2892003. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:03,653][23454] Avg episode reward: [(0, '9.920'), (1, '9.930')] [2023-10-05 17:33:03,665][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000022608_5787648.pth... [2023-10-05 17:33:03,664][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000022608_5787648.pth... [2023-10-05 17:33:03,701][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000019568_5009408.pth [2023-10-05 17:33:03,705][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000019568_5009408.pth [2023-10-05 17:33:08,652][23454] Fps is (10 sec: 6553.3, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11608064. Throughput: 0: 811.6, 1: 811.1. Samples: 2902016. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:08,653][23454] Avg episode reward: [(0, '9.920'), (1, '9.930')] [2023-10-05 17:33:11,938][24456] Updated weights for policy 0, policy_version 22720 (0.0018) [2023-10-05 17:33:11,938][24460] Updated weights for policy 1, policy_version 22720 (0.0018) [2023-10-05 17:33:13,651][23454] Fps is (10 sec: 6553.9, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11640832. Throughput: 0: 807.8, 1: 807.7. Samples: 2906868. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:13,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.930')] [2023-10-05 17:33:18,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11673600. Throughput: 0: 808.7, 1: 808.5. Samples: 2916607. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:18,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.920')] [2023-10-05 17:33:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11706368. Throughput: 0: 811.3, 1: 812.4. Samples: 2926520. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:23,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.920')] [2023-10-05 17:33:24,569][24456] Updated weights for policy 0, policy_version 22880 (0.0018) [2023-10-05 17:33:24,569][24460] Updated weights for policy 1, policy_version 22880 (0.0017) [2023-10-05 17:33:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11739136. Throughput: 0: 808.9, 1: 809.0. Samples: 2931241. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:28,652][23454] Avg episode reward: [(0, '9.930'), (1, '9.920')] [2023-10-05 17:33:33,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11771904. Throughput: 0: 813.8, 1: 813.0. Samples: 2940937. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:33,653][23454] Avg episode reward: [(0, '9.940'), (1, '9.920')] [2023-10-05 17:33:33,664][24064] Saving new best policy, reward=9.940! [2023-10-05 17:33:37,149][24460] Updated weights for policy 1, policy_version 23040 (0.0016) [2023-10-05 17:33:37,150][24456] Updated weights for policy 0, policy_version 23040 (0.0018) [2023-10-05 17:33:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11804672. Throughput: 0: 816.6, 1: 817.1. Samples: 2950956. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:38,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:33:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11837440. Throughput: 0: 810.5, 1: 810.6. Samples: 2955346. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:43,653][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:33:48,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11870208. Throughput: 0: 816.9, 1: 816.4. Samples: 2965504. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:48,653][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:33:49,777][24456] Updated weights for policy 0, policy_version 23200 (0.0018) [2023-10-05 17:33:49,777][24460] Updated weights for policy 1, policy_version 23200 (0.0020) [2023-10-05 17:33:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 11902976. Throughput: 0: 812.8, 1: 813.8. Samples: 2975211. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:53,653][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:33:58,651][23454] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11927552. Throughput: 0: 811.2, 1: 811.0. Samples: 2979866. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:33:58,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:34:02,445][24456] Updated weights for policy 0, policy_version 23360 (0.0015) [2023-10-05 17:34:02,446][24460] Updated weights for policy 1, policy_version 23360 (0.0016) [2023-10-05 17:34:03,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11960320. Throughput: 0: 812.7, 1: 812.8. Samples: 2989754. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:03,653][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:34:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 11993088. Throughput: 0: 811.7, 1: 810.9. Samples: 2999538. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:08,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:34:13,651][23454] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 12029952. Throughput: 0: 813.3, 1: 812.8. Samples: 3004416. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:13,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:34:14,928][24460] Updated weights for policy 1, policy_version 23520 (0.0018) [2023-10-05 17:34:14,929][24456] Updated weights for policy 0, policy_version 23520 (0.0017) [2023-10-05 17:34:18,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 12058624. Throughput: 0: 815.9, 1: 816.4. Samples: 3014390. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:34:18,653][23454] Avg episode reward: [(0, '9.940'), (1, '9.930')] [2023-10-05 17:34:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 12095488. Throughput: 0: 813.1, 1: 813.0. Samples: 3024130. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:34:23,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.940')] [2023-10-05 17:34:23,668][24064] Saving new best policy, reward=9.950! [2023-10-05 17:34:27,529][24456] Updated weights for policy 0, policy_version 23680 (0.0018) [2023-10-05 17:34:27,529][24460] Updated weights for policy 1, policy_version 23680 (0.0017) [2023-10-05 17:34:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12124160. Throughput: 0: 818.6, 1: 818.0. Samples: 3028992. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:34:28,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.940')] [2023-10-05 17:34:33,652][23454] Fps is (10 sec: 6143.9, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12156928. Throughput: 0: 810.2, 1: 811.5. Samples: 3038481. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:33,653][23454] Avg episode reward: [(0, '9.950'), (1, '9.950')] [2023-10-05 17:34:33,664][24178] Saving new best policy, reward=9.950! [2023-10-05 17:34:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12189696. Throughput: 0: 806.1, 1: 805.6. Samples: 3047738. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:38,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.950')] [2023-10-05 17:34:40,433][24456] Updated weights for policy 0, policy_version 23840 (0.0020) [2023-10-05 17:34:40,433][24460] Updated weights for policy 1, policy_version 23840 (0.0019) [2023-10-05 17:34:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12222464. Throughput: 0: 809.6, 1: 810.5. Samples: 3052767. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:43,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.950')] [2023-10-05 17:34:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12255232. Throughput: 0: 806.5, 1: 806.2. Samples: 3062325. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:48,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.950')] [2023-10-05 17:34:53,050][24460] Updated weights for policy 1, policy_version 24000 (0.0017) [2023-10-05 17:34:53,050][24456] Updated weights for policy 0, policy_version 24000 (0.0016) [2023-10-05 17:34:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12288000. Throughput: 0: 805.6, 1: 805.3. Samples: 3072032. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:53,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.940')] [2023-10-05 17:34:53,654][24064] Saving new best policy, reward=9.960! [2023-10-05 17:34:58,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 12320768. Throughput: 0: 808.3, 1: 808.6. Samples: 3077178. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:34:58,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.940')] [2023-10-05 17:35:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6484.2). Total num frames: 12353536. Throughput: 0: 804.4, 1: 804.6. Samples: 3086796. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:03,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.940')] [2023-10-05 17:35:03,660][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000024128_6176768.pth... [2023-10-05 17:35:03,660][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000024128_6176768.pth... [2023-10-05 17:35:03,690][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000021088_5398528.pth [2023-10-05 17:35:03,692][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000021088_5398528.pth [2023-10-05 17:35:05,720][24460] Updated weights for policy 1, policy_version 24160 (0.0017) [2023-10-05 17:35:05,720][24456] Updated weights for policy 0, policy_version 24160 (0.0016) [2023-10-05 17:35:08,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12386304. Throughput: 0: 804.6, 1: 804.7. Samples: 3096551. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:35:08,653][23454] Avg episode reward: [(0, '9.960'), (1, '9.950')] [2023-10-05 17:35:13,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 12419072. Throughput: 0: 799.8, 1: 800.4. Samples: 3101001. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:35:13,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.950')] [2023-10-05 17:35:18,455][24456] Updated weights for policy 0, policy_version 24320 (0.0017) [2023-10-05 17:35:18,455][24460] Updated weights for policy 1, policy_version 24320 (0.0017) [2023-10-05 17:35:18,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12451840. Throughput: 0: 805.4, 1: 804.2. Samples: 3110912. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:35:18,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.950')] [2023-10-05 17:35:23,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 12484608. Throughput: 0: 812.4, 1: 812.8. Samples: 3120872. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:23,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.950')] [2023-10-05 17:35:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12517376. Throughput: 0: 807.9, 1: 807.1. Samples: 3125443. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:28,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.950')] [2023-10-05 17:35:30,995][24456] Updated weights for policy 0, policy_version 24480 (0.0015) [2023-10-05 17:35:30,995][24460] Updated weights for policy 1, policy_version 24480 (0.0014) [2023-10-05 17:35:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12550144. Throughput: 0: 813.1, 1: 812.8. Samples: 3135488. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:33,653][23454] Avg episode reward: [(0, '9.960'), (1, '9.960')] [2023-10-05 17:35:33,664][24178] Saving new best policy, reward=9.960! [2023-10-05 17:35:38,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12582912. Throughput: 0: 815.1, 1: 815.6. Samples: 3145415. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:38,653][23454] Avg episode reward: [(0, '9.960'), (1, '9.960')] [2023-10-05 17:35:43,610][24456] Updated weights for policy 0, policy_version 24640 (0.0019) [2023-10-05 17:35:43,610][24460] Updated weights for policy 1, policy_version 24640 (0.0016) [2023-10-05 17:35:43,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12615680. Throughput: 0: 808.0, 1: 808.4. Samples: 3149915. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:43,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.960')] [2023-10-05 17:35:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12648448. Throughput: 0: 811.9, 1: 812.3. Samples: 3159885. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:48,652][23454] Avg episode reward: [(0, '9.970'), (1, '9.960')] [2023-10-05 17:35:48,659][24064] Saving new best policy, reward=9.970! [2023-10-05 17:35:53,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 12681216. Throughput: 0: 812.5, 1: 812.5. Samples: 3169679. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:53,653][23454] Avg episode reward: [(0, '9.980'), (1, '9.960')] [2023-10-05 17:35:53,654][24064] Saving new best policy, reward=9.980! [2023-10-05 17:35:56,079][24456] Updated weights for policy 0, policy_version 24800 (0.0015) [2023-10-05 17:35:56,080][24460] Updated weights for policy 1, policy_version 24800 (0.0017) [2023-10-05 17:35:58,652][23454] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 12709888. Throughput: 0: 815.9, 1: 815.4. Samples: 3174410. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:35:58,653][23454] Avg episode reward: [(0, '9.980'), (1, '9.960')] [2023-10-05 17:36:03,652][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 12738560. Throughput: 0: 814.4, 1: 813.3. Samples: 3184161. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:03,653][23454] Avg episode reward: [(0, '9.980'), (1, '9.960')] [2023-10-05 17:36:08,651][23454] Fps is (10 sec: 6144.1, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12771328. Throughput: 0: 809.8, 1: 809.0. Samples: 3193718. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:08,652][23454] Avg episode reward: [(0, '9.980'), (1, '9.960')] [2023-10-05 17:36:08,873][24456] Updated weights for policy 0, policy_version 24960 (0.0019) [2023-10-05 17:36:08,874][24460] Updated weights for policy 1, policy_version 24960 (0.0017) [2023-10-05 17:36:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12804096. Throughput: 0: 815.4, 1: 813.7. Samples: 3198752. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:13,652][23454] Avg episode reward: [(0, '9.970'), (1, '9.960')] [2023-10-05 17:36:18,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 12836864. Throughput: 0: 808.4, 1: 808.8. Samples: 3208260. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:18,653][23454] Avg episode reward: [(0, '9.970'), (1, '9.960')] [2023-10-05 17:36:21,508][24460] Updated weights for policy 1, policy_version 25120 (0.0016) [2023-10-05 17:36:21,508][24456] Updated weights for policy 0, policy_version 25120 (0.0017) [2023-10-05 17:36:23,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 12869632. Throughput: 0: 806.2, 1: 805.6. Samples: 3217947. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:23,652][23454] Avg episode reward: [(0, '9.960'), (1, '9.960')] [2023-10-05 17:36:28,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 12902400. Throughput: 0: 811.6, 1: 812.8. Samples: 3223015. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:28,653][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:36:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12935168. Throughput: 0: 809.0, 1: 808.5. Samples: 3232673. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:36:33,653][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:36:34,070][24460] Updated weights for policy 1, policy_version 25280 (0.0014) [2023-10-05 17:36:34,071][24456] Updated weights for policy 0, policy_version 25280 (0.0017) [2023-10-05 17:36:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 12967936. Throughput: 0: 809.4, 1: 809.7. Samples: 3242537. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:36:38,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:36:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 13000704. Throughput: 0: 813.0, 1: 813.8. Samples: 3247616. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:36:43,653][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:36:46,624][24456] Updated weights for policy 0, policy_version 25440 (0.0016) [2023-10-05 17:36:46,624][24460] Updated weights for policy 1, policy_version 25440 (0.0016) [2023-10-05 17:36:48,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 13033472. Throughput: 0: 811.0, 1: 812.7. Samples: 3257226. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:36:48,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:36:53,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 13066240. Throughput: 0: 811.6, 1: 812.2. Samples: 3266787. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:53,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.960')] [2023-10-05 17:36:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6485.4, 300 sec: 6498.1). Total num frames: 13099008. Throughput: 0: 812.5, 1: 814.3. Samples: 3271958. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:36:58,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:36:59,146][24456] Updated weights for policy 0, policy_version 25600 (0.0017) [2023-10-05 17:36:59,147][24460] Updated weights for policy 1, policy_version 25600 (0.0017) [2023-10-05 17:37:03,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13131776. Throughput: 0: 815.4, 1: 815.0. Samples: 3281628. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:03,653][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:37:03,665][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000025648_6565888.pth... [2023-10-05 17:37:03,666][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000025648_6565888.pth... [2023-10-05 17:37:03,701][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000022608_5787648.pth [2023-10-05 17:37:03,701][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000022608_5787648.pth [2023-10-05 17:37:08,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13164544. Throughput: 0: 813.5, 1: 813.5. Samples: 3291163. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:08,652][23454] Avg episode reward: [(0, '9.950'), (1, '9.960')] [2023-10-05 17:37:11,857][24460] Updated weights for policy 1, policy_version 25760 (0.0019) [2023-10-05 17:37:11,857][24456] Updated weights for policy 0, policy_version 25760 (0.0019) [2023-10-05 17:37:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13197312. Throughput: 0: 814.5, 1: 812.8. Samples: 3296241. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:13,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.970')] [2023-10-05 17:37:13,654][24178] Saving new best policy, reward=9.970! [2023-10-05 17:37:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13230080. Throughput: 0: 813.1, 1: 813.0. Samples: 3305847. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:18,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.970')] [2023-10-05 17:37:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13262848. Throughput: 0: 813.5, 1: 812.6. Samples: 3315713. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:23,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.970')] [2023-10-05 17:37:24,390][24460] Updated weights for policy 1, policy_version 25920 (0.0018) [2023-10-05 17:37:24,390][24456] Updated weights for policy 0, policy_version 25920 (0.0017) [2023-10-05 17:37:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13295616. Throughput: 0: 811.8, 1: 811.6. Samples: 3320667. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:37:28,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.970')] [2023-10-05 17:37:33,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13328384. Throughput: 0: 811.7, 1: 811.7. Samples: 3330279. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:37:33,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.970')] [2023-10-05 17:37:36,940][24456] Updated weights for policy 0, policy_version 26080 (0.0018) [2023-10-05 17:37:36,941][24460] Updated weights for policy 1, policy_version 26080 (0.0019) [2023-10-05 17:37:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13361152. Throughput: 0: 817.0, 1: 816.4. Samples: 3340288. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:37:38,652][23454] Avg episode reward: [(0, '9.940'), (1, '9.970')] [2023-10-05 17:37:43,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13393920. Throughput: 0: 812.4, 1: 811.6. Samples: 3345037. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:43,653][23454] Avg episode reward: [(0, '9.940'), (1, '9.980')] [2023-10-05 17:37:43,654][24178] Saving new best policy, reward=9.980! [2023-10-05 17:37:48,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13426688. Throughput: 0: 811.1, 1: 811.0. Samples: 3354624. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:48,652][23454] Avg episode reward: [(0, '9.920'), (1, '9.980')] [2023-10-05 17:37:49,686][24460] Updated weights for policy 1, policy_version 26240 (0.0018) [2023-10-05 17:37:49,686][24456] Updated weights for policy 0, policy_version 26240 (0.0018) [2023-10-05 17:37:53,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13459456. Throughput: 0: 816.1, 1: 816.5. Samples: 3364631. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:53,653][23454] Avg episode reward: [(0, '9.910'), (1, '9.980')] [2023-10-05 17:37:58,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13492224. Throughput: 0: 811.0, 1: 811.2. Samples: 3369236. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:37:58,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.980')] [2023-10-05 17:38:02,264][24456] Updated weights for policy 0, policy_version 26400 (0.0019) [2023-10-05 17:38:02,264][24460] Updated weights for policy 1, policy_version 26400 (0.0019) [2023-10-05 17:38:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13524992. Throughput: 0: 815.3, 1: 814.7. Samples: 3379200. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:38:03,653][23454] Avg episode reward: [(0, '9.910'), (1, '9.980')] [2023-10-05 17:38:08,652][23454] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 13553664. Throughput: 0: 812.1, 1: 811.1. Samples: 3388756. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:38:08,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.980')] [2023-10-05 17:38:13,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 13582336. Throughput: 0: 809.9, 1: 809.4. Samples: 3393536. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:38:13,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.980')] [2023-10-05 17:38:14,929][24456] Updated weights for policy 0, policy_version 26560 (0.0016) [2023-10-05 17:38:14,929][24460] Updated weights for policy 1, policy_version 26560 (0.0017) [2023-10-05 17:38:18,652][23454] Fps is (10 sec: 6144.0, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 13615104. Throughput: 0: 813.5, 1: 813.2. Samples: 3403479. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:38:18,653][23454] Avg episode reward: [(0, '9.910'), (1, '9.980')] [2023-10-05 17:38:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 13647872. Throughput: 0: 808.4, 1: 809.6. Samples: 3413097. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:38:23,652][23454] Avg episode reward: [(0, '9.910'), (1, '9.980')] [2023-10-05 17:38:27,534][24460] Updated weights for policy 1, policy_version 26720 (0.0018) [2023-10-05 17:38:27,534][24456] Updated weights for policy 0, policy_version 26720 (0.0018) [2023-10-05 17:38:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 13680640. Throughput: 0: 811.9, 1: 812.0. Samples: 3418112. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:38:28,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.980')] [2023-10-05 17:38:33,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 13713408. Throughput: 0: 814.1, 1: 815.1. Samples: 3427936. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:38:33,653][23454] Avg episode reward: [(0, '9.900'), (1, '9.980')] [2023-10-05 17:38:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 13746176. Throughput: 0: 808.1, 1: 808.0. Samples: 3437356. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:38:38,652][23454] Avg episode reward: [(0, '9.900'), (1, '9.980')] [2023-10-05 17:38:40,152][24456] Updated weights for policy 0, policy_version 26880 (0.0017) [2023-10-05 17:38:40,153][24460] Updated weights for policy 1, policy_version 26880 (0.0018) [2023-10-05 17:38:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 13778944. Throughput: 0: 813.9, 1: 814.0. Samples: 3442493. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:38:43,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.980')] [2023-10-05 17:38:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 13811712. Throughput: 0: 811.4, 1: 811.7. Samples: 3452238. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:38:48,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.980')] [2023-10-05 17:38:52,660][24456] Updated weights for policy 0, policy_version 27040 (0.0016) [2023-10-05 17:38:52,660][24460] Updated weights for policy 1, policy_version 27040 (0.0019) [2023-10-05 17:38:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 13844480. Throughput: 0: 812.2, 1: 813.3. Samples: 3461907. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:38:53,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.980')] [2023-10-05 17:38:58,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 13877248. Throughput: 0: 817.2, 1: 817.4. Samples: 3467093. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:38:58,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.980')] [2023-10-05 17:39:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 13910016. Throughput: 0: 812.8, 1: 813.8. Samples: 3476678. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:39:03,653][23454] Avg episode reward: [(0, '9.890'), (1, '9.990')] [2023-10-05 17:39:03,662][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000027168_6955008.pth... [2023-10-05 17:39:03,662][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000027168_6955008.pth... [2023-10-05 17:39:03,698][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000024128_6176768.pth [2023-10-05 17:39:03,698][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000024128_6176768.pth [2023-10-05 17:39:03,702][24178] Saving new best policy, reward=9.990! [2023-10-05 17:39:05,315][24456] Updated weights for policy 0, policy_version 27200 (0.0017) [2023-10-05 17:39:05,315][24460] Updated weights for policy 1, policy_version 27200 (0.0018) [2023-10-05 17:39:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6484.2). Total num frames: 13942784. Throughput: 0: 811.7, 1: 811.2. Samples: 3486127. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:39:08,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.990')] [2023-10-05 17:39:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 13975552. Throughput: 0: 812.3, 1: 811.9. Samples: 3491200. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:39:13,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.990')] [2023-10-05 17:39:18,020][24460] Updated weights for policy 1, policy_version 27360 (0.0017) [2023-10-05 17:39:18,020][24456] Updated weights for policy 0, policy_version 27360 (0.0016) [2023-10-05 17:39:18,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6484.2). Total num frames: 14008320. Throughput: 0: 809.4, 1: 808.7. Samples: 3500748. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:39:18,652][23454] Avg episode reward: [(0, '9.890'), (1, '9.990')] [2023-10-05 17:39:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14041088. Throughput: 0: 812.0, 1: 811.8. Samples: 3510426. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:39:23,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:39:23,652][24178] Saving new best policy, reward=10.000! [2023-10-05 17:39:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14073856. Throughput: 0: 811.9, 1: 812.0. Samples: 3515566. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:39:28,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:39:30,479][24456] Updated weights for policy 0, policy_version 27520 (0.0017) [2023-10-05 17:39:30,480][24460] Updated weights for policy 1, policy_version 27520 (0.0016) [2023-10-05 17:39:33,651][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14106624. Throughput: 0: 812.0, 1: 812.3. Samples: 3525331. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:39:33,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:39:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14139392. Throughput: 0: 810.5, 1: 810.5. Samples: 3534852. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:39:38,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:39:43,206][24456] Updated weights for policy 0, policy_version 27680 (0.0016) [2023-10-05 17:39:43,206][24460] Updated weights for policy 1, policy_version 27680 (0.0017) [2023-10-05 17:39:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14172160. Throughput: 0: 806.8, 1: 807.9. Samples: 3539757. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:39:43,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:39:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14204928. Throughput: 0: 809.5, 1: 808.7. Samples: 3549496. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:39:48,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:39:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14237696. Throughput: 0: 814.8, 1: 814.1. Samples: 3559425. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:39:53,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:39:55,639][24456] Updated weights for policy 0, policy_version 27840 (0.0017) [2023-10-05 17:39:55,639][24460] Updated weights for policy 1, policy_version 27840 (0.0017) [2023-10-05 17:39:58,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14270464. Throughput: 0: 812.7, 1: 813.8. Samples: 3564389. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:39:58,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:40:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14303232. Throughput: 0: 814.1, 1: 814.3. Samples: 3574030. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:40:03,653][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:40:08,416][24456] Updated weights for policy 0, policy_version 28000 (0.0018) [2023-10-05 17:40:08,416][24460] Updated weights for policy 1, policy_version 28000 (0.0017) [2023-10-05 17:40:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14336000. Throughput: 0: 817.4, 1: 815.2. Samples: 3583893. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:40:08,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:40:13,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14368768. Throughput: 0: 808.5, 1: 808.7. Samples: 3588340. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:40:13,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:40:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14401536. Throughput: 0: 811.2, 1: 810.9. Samples: 3598325. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:40:18,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:40:21,051][24456] Updated weights for policy 0, policy_version 28160 (0.0017) [2023-10-05 17:40:21,051][24460] Updated weights for policy 1, policy_version 28160 (0.0015) [2023-10-05 17:40:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14434304. Throughput: 0: 812.7, 1: 813.1. Samples: 3608011. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:40:23,653][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:40:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14467072. Throughput: 0: 810.9, 1: 809.9. Samples: 3612692. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:40:28,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:40:33,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 14491648. Throughput: 0: 812.3, 1: 813.1. Samples: 3622639. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:40:33,653][23454] Avg episode reward: [(0, '9.900'), (1, '10.000')] [2023-10-05 17:40:33,686][24456] Updated weights for policy 0, policy_version 28320 (0.0016) [2023-10-05 17:40:33,686][24460] Updated weights for policy 1, policy_version 28320 (0.0018) [2023-10-05 17:40:38,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 14524416. Throughput: 0: 810.2, 1: 810.8. Samples: 3632369. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:40:38,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:40:43,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 14557184. Throughput: 0: 809.9, 1: 809.2. Samples: 3637248. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:40:43,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:40:46,195][24460] Updated weights for policy 1, policy_version 28480 (0.0016) [2023-10-05 17:40:46,195][24456] Updated weights for policy 0, policy_version 28480 (0.0018) [2023-10-05 17:40:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 14589952. Throughput: 0: 813.0, 1: 813.4. Samples: 3647215. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:40:48,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:40:53,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6484.2). Total num frames: 14622720. Throughput: 0: 811.1, 1: 812.4. Samples: 3656954. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:40:53,653][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:40:58,651][23454] Fps is (10 sec: 6963.1, 60 sec: 6485.3, 300 sec: 6511.9). Total num frames: 14659584. Throughput: 0: 816.8, 1: 816.2. Samples: 3661824. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:40:58,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:40:58,677][24456] Updated weights for policy 0, policy_version 28640 (0.0018) [2023-10-05 17:40:58,677][24460] Updated weights for policy 1, policy_version 28640 (0.0016) [2023-10-05 17:41:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 14688256. Throughput: 0: 814.5, 1: 815.2. Samples: 3671660. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:41:03,653][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:41:03,741][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000028704_7348224.pth... [2023-10-05 17:41:03,767][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000025648_6565888.pth [2023-10-05 17:41:03,773][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000028704_7348224.pth... [2023-10-05 17:41:03,801][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000025648_6565888.pth [2023-10-05 17:41:08,651][23454] Fps is (10 sec: 6144.0, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 14721024. Throughput: 0: 813.8, 1: 814.4. Samples: 3681280. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:41:08,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:41:11,365][24460] Updated weights for policy 1, policy_version 28800 (0.0019) [2023-10-05 17:41:11,366][24456] Updated weights for policy 0, policy_version 28800 (0.0017) [2023-10-05 17:41:13,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 14753792. Throughput: 0: 818.2, 1: 817.6. Samples: 3686305. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:41:13,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:41:18,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 14786560. Throughput: 0: 815.5, 1: 814.6. Samples: 3695995. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:41:18,653][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:41:23,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 14819328. Throughput: 0: 812.9, 1: 812.5. Samples: 3705514. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:41:23,653][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:41:23,980][24456] Updated weights for policy 0, policy_version 28960 (0.0016) [2023-10-05 17:41:23,980][24460] Updated weights for policy 1, policy_version 28960 (0.0017) [2023-10-05 17:41:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 14852096. Throughput: 0: 815.1, 1: 815.1. Samples: 3710607. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:41:28,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:41:33,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14884864. Throughput: 0: 813.6, 1: 813.2. Samples: 3720420. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:41:33,652][23454] Avg episode reward: [(0, '9.950'), (1, '10.000')] [2023-10-05 17:41:36,441][24456] Updated weights for policy 0, policy_version 29120 (0.0017) [2023-10-05 17:41:36,441][24460] Updated weights for policy 1, policy_version 29120 (0.0017) [2023-10-05 17:41:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14917632. Throughput: 0: 811.5, 1: 812.0. Samples: 3730010. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:41:38,652][23454] Avg episode reward: [(0, '9.950'), (1, '10.000')] [2023-10-05 17:41:43,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14950400. Throughput: 0: 814.5, 1: 815.2. Samples: 3735163. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:41:43,653][23454] Avg episode reward: [(0, '9.950'), (1, '10.000')] [2023-10-05 17:41:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 14983168. Throughput: 0: 815.8, 1: 814.9. Samples: 3745038. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-10-05 17:41:48,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:41:48,947][24460] Updated weights for policy 1, policy_version 29280 (0.0018) [2023-10-05 17:41:48,948][24456] Updated weights for policy 0, policy_version 29280 (0.0019) [2023-10-05 17:41:53,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15015936. Throughput: 0: 814.7, 1: 814.2. Samples: 3754578. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:41:53,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:41:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6498.1). Total num frames: 15048704. Throughput: 0: 815.7, 1: 816.9. Samples: 3759775. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:41:58,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:42:01,486][24456] Updated weights for policy 0, policy_version 29440 (0.0017) [2023-10-05 17:42:01,486][24460] Updated weights for policy 1, policy_version 29440 (0.0018) [2023-10-05 17:42:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15081472. Throughput: 0: 816.7, 1: 816.5. Samples: 3769489. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:42:03,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:42:08,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15114240. Throughput: 0: 818.3, 1: 819.3. Samples: 3779204. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:42:08,653][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:42:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15147008. Throughput: 0: 817.6, 1: 817.2. Samples: 3784171. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:42:13,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:42:14,123][24456] Updated weights for policy 0, policy_version 29600 (0.0018) [2023-10-05 17:42:14,123][24460] Updated weights for policy 1, policy_version 29600 (0.0017) [2023-10-05 17:42:18,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15179776. Throughput: 0: 814.3, 1: 813.2. Samples: 3793658. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:42:18,653][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:42:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15212544. Throughput: 0: 812.8, 1: 812.6. Samples: 3803150. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:42:23,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:42:26,824][24460] Updated weights for policy 1, policy_version 29760 (0.0018) [2023-10-05 17:42:26,824][24456] Updated weights for policy 0, policy_version 29760 (0.0020) [2023-10-05 17:42:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15245312. Throughput: 0: 812.1, 1: 811.6. Samples: 3808229. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:42:28,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:42:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15278080. Throughput: 0: 808.0, 1: 808.3. Samples: 3817768. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:42:33,653][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:42:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15310848. Throughput: 0: 812.9, 1: 812.3. Samples: 3827712. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:42:38,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:42:39,455][24460] Updated weights for policy 1, policy_version 29920 (0.0019) [2023-10-05 17:42:39,456][24456] Updated weights for policy 0, policy_version 29920 (0.0017) [2023-10-05 17:42:43,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15343616. Throughput: 0: 808.6, 1: 808.5. Samples: 3832545. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:42:43,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:42:48,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15376384. Throughput: 0: 809.4, 1: 810.0. Samples: 3842362. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-10-05 17:42:48,653][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:42:51,944][24456] Updated weights for policy 0, policy_version 30080 (0.0017) [2023-10-05 17:42:51,945][24460] Updated weights for policy 1, policy_version 30080 (0.0015) [2023-10-05 17:42:53,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15409152. Throughput: 0: 812.6, 1: 811.4. Samples: 3852288. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:42:53,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:42:58,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 15441920. Throughput: 0: 807.9, 1: 808.9. Samples: 3856927. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:42:58,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:43:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6511.9). Total num frames: 15474688. Throughput: 0: 810.4, 1: 811.1. Samples: 3866628. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:43:03,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:43:03,661][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000030224_7737344.pth... [2023-10-05 17:43:03,661][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000030224_7737344.pth... [2023-10-05 17:43:03,698][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000027168_6955008.pth [2023-10-05 17:43:03,699][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000027168_6955008.pth [2023-10-05 17:43:04,681][24460] Updated weights for policy 1, policy_version 30240 (0.0016) [2023-10-05 17:43:04,681][24456] Updated weights for policy 0, policy_version 30240 (0.0017) [2023-10-05 17:43:08,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15507456. Throughput: 0: 815.4, 1: 815.8. Samples: 3876557. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:43:08,653][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:43:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15540224. Throughput: 0: 808.6, 1: 808.7. Samples: 3881008. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:43:13,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:43:17,233][24456] Updated weights for policy 0, policy_version 30400 (0.0018) [2023-10-05 17:43:17,233][24460] Updated weights for policy 1, policy_version 30400 (0.0017) [2023-10-05 17:43:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15572992. Throughput: 0: 816.1, 1: 815.8. Samples: 3891200. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:43:18,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:43:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15605760. Throughput: 0: 816.2, 1: 816.6. Samples: 3901188. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:43:23,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:43:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15638528. Throughput: 0: 814.9, 1: 814.9. Samples: 3905887. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:43:28,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:43:29,639][24456] Updated weights for policy 0, policy_version 30560 (0.0015) [2023-10-05 17:43:29,640][24460] Updated weights for policy 1, policy_version 30560 (0.0016) [2023-10-05 17:43:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15671296. Throughput: 0: 816.2, 1: 815.2. Samples: 3915776. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:43:33,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:43:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15704064. Throughput: 0: 817.2, 1: 817.6. Samples: 3925851. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:43:38,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:43:42,143][24460] Updated weights for policy 1, policy_version 30720 (0.0018) [2023-10-05 17:43:42,143][24456] Updated weights for policy 0, policy_version 30720 (0.0017) [2023-10-05 17:43:43,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15736832. Throughput: 0: 817.2, 1: 817.1. Samples: 3930470. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:43:43,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:43:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15769600. Throughput: 0: 819.2, 1: 819.1. Samples: 3940352. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:43:48,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:43:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15802368. Throughput: 0: 815.5, 1: 815.2. Samples: 3949936. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:43:53,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:43:54,868][24460] Updated weights for policy 1, policy_version 30880 (0.0016) [2023-10-05 17:43:54,868][24456] Updated weights for policy 0, policy_version 30880 (0.0017) [2023-10-05 17:43:58,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15835136. Throughput: 0: 818.9, 1: 818.5. Samples: 3954690. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:43:58,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:44:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15867904. Throughput: 0: 817.4, 1: 817.3. Samples: 3964759. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:44:03,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:44:07,364][24460] Updated weights for policy 1, policy_version 31040 (0.0017) [2023-10-05 17:44:07,364][24456] Updated weights for policy 0, policy_version 31040 (0.0015) [2023-10-05 17:44:08,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15900672. Throughput: 0: 814.9, 1: 815.0. Samples: 3974537. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-10-05 17:44:08,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:44:13,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 15933440. Throughput: 0: 815.7, 1: 815.0. Samples: 3979269. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:44:13,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:18,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 15958016. Throughput: 0: 816.1, 1: 816.6. Samples: 3989249. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:44:18,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:20,028][24456] Updated weights for policy 0, policy_version 31200 (0.0016) [2023-10-05 17:44:20,029][24460] Updated weights for policy 1, policy_version 31200 (0.0017) [2023-10-05 17:44:23,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 15990784. Throughput: 0: 810.4, 1: 809.8. Samples: 3998758. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:44:23,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16023552. Throughput: 0: 813.8, 1: 812.4. Samples: 4003646. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:44:28,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:32,754][24456] Updated weights for policy 0, policy_version 31360 (0.0019) [2023-10-05 17:44:32,754][24460] Updated weights for policy 1, policy_version 31360 (0.0019) [2023-10-05 17:44:33,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16056320. Throughput: 0: 809.3, 1: 809.9. Samples: 4013218. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:44:33,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16089088. Throughput: 0: 807.4, 1: 807.6. Samples: 4022608. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:44:38,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16121856. Throughput: 0: 809.7, 1: 810.0. Samples: 4027576. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:44:43,653][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:45,516][24456] Updated weights for policy 0, policy_version 31520 (0.0018) [2023-10-05 17:44:45,516][24460] Updated weights for policy 1, policy_version 31520 (0.0017) [2023-10-05 17:44:48,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16154624. Throughput: 0: 805.0, 1: 805.6. Samples: 4037237. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:44:48,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16187392. Throughput: 0: 803.8, 1: 803.3. Samples: 4046857. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:44:53,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:44:58,227][24456] Updated weights for policy 0, policy_version 31680 (0.0017) [2023-10-05 17:44:58,228][24460] Updated weights for policy 1, policy_version 31680 (0.0018) [2023-10-05 17:44:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16220160. Throughput: 0: 805.2, 1: 805.9. Samples: 4051771. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:44:58,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:03,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16252928. Throughput: 0: 800.5, 1: 800.6. Samples: 4061302. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:45:03,653][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:03,665][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000031744_8126464.pth... [2023-10-05 17:45:03,665][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000031744_8126464.pth... [2023-10-05 17:45:03,703][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000028704_7348224.pth [2023-10-05 17:45:03,703][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000028704_7348224.pth [2023-10-05 17:45:08,651][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16285696. Throughput: 0: 807.3, 1: 807.5. Samples: 4071424. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:45:08,653][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:10,698][24460] Updated weights for policy 1, policy_version 31840 (0.0017) [2023-10-05 17:45:10,698][24456] Updated weights for policy 0, policy_version 31840 (0.0016) [2023-10-05 17:45:13,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16318464. Throughput: 0: 806.2, 1: 807.9. Samples: 4076283. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-10-05 17:45:13,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 16351232. Throughput: 0: 808.4, 1: 808.3. Samples: 4085968. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:45:18,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:23,139][24456] Updated weights for policy 0, policy_version 32000 (0.0015) [2023-10-05 17:45:23,140][24460] Updated weights for policy 1, policy_version 32000 (0.0018) [2023-10-05 17:45:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 16384000. Throughput: 0: 815.7, 1: 815.3. Samples: 4096001. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:45:23,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:28,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 16416768. Throughput: 0: 813.9, 1: 814.0. Samples: 4100833. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:45:28,653][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:33,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 16449536. Throughput: 0: 812.8, 1: 813.0. Samples: 4110399. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-10-05 17:45:33,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:35,816][24456] Updated weights for policy 0, policy_version 32160 (0.0016) [2023-10-05 17:45:35,817][24460] Updated weights for policy 1, policy_version 32160 (0.0016) [2023-10-05 17:45:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 16482304. Throughput: 0: 818.0, 1: 819.0. Samples: 4120522. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:45:38,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 16515072. Throughput: 0: 815.7, 1: 815.4. Samples: 4125172. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:45:43,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:48,621][24460] Updated weights for policy 1, policy_version 32320 (0.0015) [2023-10-05 17:45:48,621][24456] Updated weights for policy 0, policy_version 32320 (0.0019) [2023-10-05 17:45:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 16547840. Throughput: 0: 818.2, 1: 817.6. Samples: 4134912. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:45:48,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:53,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6484.2). Total num frames: 16572416. Throughput: 0: 808.4, 1: 809.5. Samples: 4144230. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:45:53,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:45:58,651][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16605184. Throughput: 0: 809.3, 1: 809.7. Samples: 4149141. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-10-05 17:45:58,653][23454] Avg episode reward: [(0, '9.850'), (1, '10.000')] [2023-10-05 17:46:01,458][24456] Updated weights for policy 0, policy_version 32480 (0.0018) [2023-10-05 17:46:01,459][24460] Updated weights for policy 1, policy_version 32480 (0.0018) [2023-10-05 17:46:03,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16637952. Throughput: 0: 807.7, 1: 807.2. Samples: 4158641. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:46:03,652][23454] Avg episode reward: [(0, '9.850'), (1, '10.000')] [2023-10-05 17:46:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16670720. Throughput: 0: 803.9, 1: 803.5. Samples: 4168334. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:46:08,652][23454] Avg episode reward: [(0, '9.860'), (1, '10.000')] [2023-10-05 17:46:13,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16703488. Throughput: 0: 806.9, 1: 807.6. Samples: 4173485. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:46:13,653][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:46:14,010][24460] Updated weights for policy 1, policy_version 32640 (0.0018) [2023-10-05 17:46:14,010][24456] Updated weights for policy 0, policy_version 32640 (0.0017) [2023-10-05 17:46:18,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16736256. Throughput: 0: 806.9, 1: 807.0. Samples: 4183024. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:46:18,653][23454] Avg episode reward: [(0, '9.880'), (1, '10.000')] [2023-10-05 17:46:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16769024. Throughput: 0: 801.6, 1: 801.2. Samples: 4192646. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:46:23,652][23454] Avg episode reward: [(0, '9.880'), (1, '10.000')] [2023-10-05 17:46:26,599][24456] Updated weights for policy 0, policy_version 32800 (0.0017) [2023-10-05 17:46:26,599][24460] Updated weights for policy 1, policy_version 32800 (0.0017) [2023-10-05 17:46:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16801792. Throughput: 0: 806.5, 1: 806.8. Samples: 4197770. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:46:28,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:46:33,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16834560. Throughput: 0: 805.1, 1: 806.1. Samples: 4207413. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:46:33,653][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:46:38,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16867328. Throughput: 0: 808.7, 1: 808.1. Samples: 4216987. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:46:38,653][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:46:39,221][24456] Updated weights for policy 0, policy_version 32960 (0.0018) [2023-10-05 17:46:39,221][24460] Updated weights for policy 1, policy_version 32960 (0.0016) [2023-10-05 17:46:43,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 16900096. Throughput: 0: 811.1, 1: 810.1. Samples: 4222096. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:46:43,652][23454] Avg episode reward: [(0, '9.880'), (1, '10.000')] [2023-10-05 17:46:48,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 16932864. Throughput: 0: 808.6, 1: 809.4. Samples: 4231451. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:46:48,652][23454] Avg episode reward: [(0, '9.880'), (1, '10.000')] [2023-10-05 17:46:52,132][24456] Updated weights for policy 0, policy_version 33120 (0.0018) [2023-10-05 17:46:52,132][24460] Updated weights for policy 1, policy_version 33120 (0.0017) [2023-10-05 17:46:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 16965632. Throughput: 0: 810.1, 1: 810.8. Samples: 4241271. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:46:53,652][23454] Avg episode reward: [(0, '9.880'), (1, '10.000')] [2023-10-05 17:46:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 16998400. Throughput: 0: 802.6, 1: 802.2. Samples: 4245700. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:46:58,652][23454] Avg episode reward: [(0, '9.880'), (1, '10.000')] [2023-10-05 17:47:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 17031168. Throughput: 0: 808.3, 1: 807.7. Samples: 4255744. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:47:03,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:47:03,659][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000033264_8515584.pth... [2023-10-05 17:47:03,660][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000033264_8515584.pth... [2023-10-05 17:47:03,694][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000030224_7737344.pth [2023-10-05 17:47:03,695][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000030224_7737344.pth [2023-10-05 17:47:04,782][24460] Updated weights for policy 1, policy_version 33280 (0.0016) [2023-10-05 17:47:04,782][24456] Updated weights for policy 0, policy_version 33280 (0.0018) [2023-10-05 17:47:08,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 17063936. Throughput: 0: 809.2, 1: 809.9. Samples: 4265508. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:47:08,652][23454] Avg episode reward: [(0, '9.890'), (1, '10.000')] [2023-10-05 17:47:13,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 17096704. Throughput: 0: 803.9, 1: 803.3. Samples: 4270093. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:47:13,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:47:17,427][24460] Updated weights for policy 1, policy_version 33440 (0.0017) [2023-10-05 17:47:17,427][24456] Updated weights for policy 0, policy_version 33440 (0.0017) [2023-10-05 17:47:18,652][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17121280. Throughput: 0: 807.1, 1: 806.9. Samples: 4280043. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:47:18,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:47:23,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17154048. Throughput: 0: 808.4, 1: 808.7. Samples: 4289756. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:47:23,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:47:28,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17186816. Throughput: 0: 806.3, 1: 806.2. Samples: 4294656. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:47:28,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:47:29,955][24460] Updated weights for policy 1, policy_version 33600 (0.0017) [2023-10-05 17:47:29,955][24456] Updated weights for policy 0, policy_version 33600 (0.0018) [2023-10-05 17:47:33,651][23454] Fps is (10 sec: 6963.2, 60 sec: 6485.4, 300 sec: 6484.2). Total num frames: 17223680. Throughput: 0: 813.6, 1: 812.8. Samples: 4304636. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:47:33,653][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:47:38,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17252352. Throughput: 0: 812.8, 1: 811.7. Samples: 4314376. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:47:38,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:47:42,492][24460] Updated weights for policy 1, policy_version 33760 (0.0016) [2023-10-05 17:47:42,493][24456] Updated weights for policy 0, policy_version 33760 (0.0018) [2023-10-05 17:47:43,652][23454] Fps is (10 sec: 6143.9, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 17285120. Throughput: 0: 817.4, 1: 816.6. Samples: 4319232. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:47:43,653][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:47:48,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17317888. Throughput: 0: 813.4, 1: 814.7. Samples: 4329009. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:47:48,653][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:47:53,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17350656. Throughput: 0: 814.3, 1: 813.5. Samples: 4338762. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:47:53,652][23454] Avg episode reward: [(0, '9.960'), (1, '10.000')] [2023-10-05 17:47:55,010][24456] Updated weights for policy 0, policy_version 33920 (0.0019) [2023-10-05 17:47:55,010][24460] Updated weights for policy 1, policy_version 33920 (0.0015) [2023-10-05 17:47:58,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17383424. Throughput: 0: 819.1, 1: 818.8. Samples: 4343798. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:47:58,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17416192. Throughput: 0: 812.9, 1: 812.0. Samples: 4353163. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:48:03,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:07,805][24460] Updated weights for policy 1, policy_version 34080 (0.0017) [2023-10-05 17:48:07,806][24456] Updated weights for policy 0, policy_version 34080 (0.0018) [2023-10-05 17:48:08,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17448960. Throughput: 0: 811.3, 1: 810.7. Samples: 4362743. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:48:08,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:13,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 17481728. Throughput: 0: 810.7, 1: 812.8. Samples: 4367712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:48:13,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17514496. Throughput: 0: 806.9, 1: 807.4. Samples: 4377279. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:48:18,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:20,474][24456] Updated weights for policy 0, policy_version 34240 (0.0015) [2023-10-05 17:48:20,474][24460] Updated weights for policy 1, policy_version 34240 (0.0015) [2023-10-05 17:48:23,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17547264. Throughput: 0: 806.9, 1: 808.4. Samples: 4387065. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:48:23,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17580032. Throughput: 0: 810.0, 1: 810.2. Samples: 4392138. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:48:28,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:32,978][24456] Updated weights for policy 0, policy_version 34400 (0.0017) [2023-10-05 17:48:32,978][24460] Updated weights for policy 1, policy_version 34400 (0.0018) [2023-10-05 17:48:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6470.3). Total num frames: 17612800. Throughput: 0: 809.7, 1: 808.7. Samples: 4401839. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-10-05 17:48:33,653][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:48:38,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17645568. Throughput: 0: 808.9, 1: 809.2. Samples: 4411579. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:48:38,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:48:43,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17678336. Throughput: 0: 809.6, 1: 810.7. Samples: 4416713. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:48:43,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:48:45,424][24460] Updated weights for policy 1, policy_version 34560 (0.0016) [2023-10-05 17:48:45,424][24456] Updated weights for policy 0, policy_version 34560 (0.0017) [2023-10-05 17:48:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17711104. Throughput: 0: 815.0, 1: 815.6. Samples: 4426539. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:48:48,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:48:53,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17743872. Throughput: 0: 815.0, 1: 815.5. Samples: 4436113. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:48:53,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:48:57,953][24460] Updated weights for policy 1, policy_version 34720 (0.0016) [2023-10-05 17:48:57,954][24456] Updated weights for policy 0, policy_version 34720 (0.0017) [2023-10-05 17:48:58,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17776640. Throughput: 0: 817.8, 1: 816.1. Samples: 4441237. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:48:58,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17809408. Throughput: 0: 819.5, 1: 818.8. Samples: 4451003. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:49:03,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:03,660][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000034784_8904704.pth... [2023-10-05 17:49:03,660][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000034784_8904704.pth... [2023-10-05 17:49:03,695][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000031744_8126464.pth [2023-10-05 17:49:03,696][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000031744_8126464.pth [2023-10-05 17:49:08,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 17842176. Throughput: 0: 816.9, 1: 816.8. Samples: 4460582. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:49:08,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:10,668][24460] Updated weights for policy 1, policy_version 34880 (0.0019) [2023-10-05 17:49:10,668][24456] Updated weights for policy 0, policy_version 34880 (0.0019) [2023-10-05 17:49:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 17874944. Throughput: 0: 814.4, 1: 814.7. Samples: 4465446. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:49:13,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:18,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 17907712. Throughput: 0: 814.6, 1: 815.0. Samples: 4475170. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:49:18,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:23,164][24456] Updated weights for policy 0, policy_version 35040 (0.0018) [2023-10-05 17:49:23,164][24460] Updated weights for policy 1, policy_version 35040 (0.0017) [2023-10-05 17:49:23,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 17940480. Throughput: 0: 817.5, 1: 816.8. Samples: 4485120. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:23,653][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:28,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 17973248. Throughput: 0: 815.8, 1: 814.5. Samples: 4490077. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:28,653][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:33,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18006016. Throughput: 0: 812.9, 1: 812.8. Samples: 4499695. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:33,652][23454] Avg episode reward: [(0, '9.980'), (1, '10.000')] [2023-10-05 17:49:35,677][24460] Updated weights for policy 1, policy_version 35200 (0.0018) [2023-10-05 17:49:35,677][24456] Updated weights for policy 0, policy_version 35200 (0.0015) [2023-10-05 17:49:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18038784. Throughput: 0: 817.9, 1: 817.3. Samples: 4509696. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:38,652][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:49:43,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18071552. Throughput: 0: 814.7, 1: 814.8. Samples: 4514568. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:43,653][23454] Avg episode reward: [(0, '9.970'), (1, '10.000')] [2023-10-05 17:49:48,202][24456] Updated weights for policy 0, policy_version 35360 (0.0017) [2023-10-05 17:49:48,202][24460] Updated weights for policy 1, policy_version 35360 (0.0016) [2023-10-05 17:49:48,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18104320. Throughput: 0: 814.0, 1: 814.8. Samples: 4524301. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:48,652][23454] Avg episode reward: [(0, '9.960'), (1, '10.000')] [2023-10-05 17:49:53,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18137088. Throughput: 0: 819.0, 1: 818.6. Samples: 4534272. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:53,652][23454] Avg episode reward: [(0, '9.960'), (1, '10.000')] [2023-10-05 17:49:58,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18169856. Throughput: 0: 819.2, 1: 819.0. Samples: 4539163. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:49:58,653][23454] Avg episode reward: [(0, '9.960'), (1, '10.000')] [2023-10-05 17:50:00,729][24456] Updated weights for policy 0, policy_version 35520 (0.0017) [2023-10-05 17:50:00,731][24460] Updated weights for policy 1, policy_version 35520 (0.0019) [2023-10-05 17:50:03,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18202624. Throughput: 0: 818.5, 1: 818.1. Samples: 4548818. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:50:03,653][23454] Avg episode reward: [(0, '9.950'), (1, '10.000')] [2023-10-05 17:50:08,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18235392. Throughput: 0: 817.4, 1: 816.8. Samples: 4558657. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:50:08,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:50:13,547][24456] Updated weights for policy 0, policy_version 35680 (0.0018) [2023-10-05 17:50:13,547][24460] Updated weights for policy 1, policy_version 35680 (0.0019) [2023-10-05 17:50:13,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18268160. Throughput: 0: 809.5, 1: 810.0. Samples: 4562954. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:50:13,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:50:18,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18300928. Throughput: 0: 816.2, 1: 816.0. Samples: 4573145. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:50:18,652][23454] Avg episode reward: [(0, '9.930'), (1, '10.000')] [2023-10-05 17:50:23,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18333696. Throughput: 0: 812.4, 1: 812.8. Samples: 4582829. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:50:23,652][23454] Avg episode reward: [(0, '9.940'), (1, '10.000')] [2023-10-05 17:50:26,120][24456] Updated weights for policy 0, policy_version 35840 (0.0016) [2023-10-05 17:50:26,120][24460] Updated weights for policy 1, policy_version 35840 (0.0014) [2023-10-05 17:50:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18366464. Throughput: 0: 810.8, 1: 810.4. Samples: 4587524. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-10-05 17:50:28,652][23454] Avg episode reward: [(0, '9.920'), (1, '10.000')] [2023-10-05 17:50:33,652][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 18391040. Throughput: 0: 811.5, 1: 810.5. Samples: 4597291. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:50:33,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:50:38,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 18423808. Throughput: 0: 805.0, 1: 806.6. Samples: 4606791. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:50:38,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:50:38,958][24460] Updated weights for policy 1, policy_version 36000 (0.0017) [2023-10-05 17:50:38,958][24456] Updated weights for policy 0, policy_version 36000 (0.0018) [2023-10-05 17:50:43,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 18456576. Throughput: 0: 806.4, 1: 806.8. Samples: 4611755. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:50:43,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:50:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18489344. Throughput: 0: 808.4, 1: 809.1. Samples: 4621605. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:50:48,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:50:51,386][24456] Updated weights for policy 0, policy_version 36160 (0.0015) [2023-10-05 17:50:51,386][24460] Updated weights for policy 1, policy_version 36160 (0.0018) [2023-10-05 17:50:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18522112. Throughput: 0: 808.2, 1: 809.2. Samples: 4631440. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-10-05 17:50:53,652][23454] Avg episode reward: [(0, '9.900'), (1, '10.000')] [2023-10-05 17:50:58,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18554880. Throughput: 0: 818.7, 1: 818.9. Samples: 4636649. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:50:58,652][23454] Avg episode reward: [(0, '9.900'), (1, '10.000')] [2023-10-05 17:51:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18587648. Throughput: 0: 810.6, 1: 811.3. Samples: 4646131. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:51:03,652][23454] Avg episode reward: [(0, '9.910'), (1, '10.000')] [2023-10-05 17:51:03,661][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000036304_9293824.pth... [2023-10-05 17:51:03,661][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000036304_9293824.pth... [2023-10-05 17:51:03,697][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000033264_8515584.pth [2023-10-05 17:51:03,698][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000033264_8515584.pth [2023-10-05 17:51:03,942][24456] Updated weights for policy 0, policy_version 36320 (0.0017) [2023-10-05 17:51:03,942][24460] Updated weights for policy 1, policy_version 36320 (0.0018) [2023-10-05 17:51:08,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 18620416. Throughput: 0: 812.0, 1: 812.1. Samples: 4655912. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:51:08,653][23454] Avg episode reward: [(0, '9.900'), (1, '10.000')] [2023-10-05 17:51:13,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18653184. Throughput: 0: 816.0, 1: 816.9. Samples: 4661004. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:51:13,653][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:51:16,442][24456] Updated weights for policy 0, policy_version 36480 (0.0018) [2023-10-05 17:51:16,442][24460] Updated weights for policy 1, policy_version 36480 (0.0017) [2023-10-05 17:51:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18685952. Throughput: 0: 814.8, 1: 815.7. Samples: 4670667. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:51:18,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:51:23,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18718720. Throughput: 0: 816.0, 1: 815.6. Samples: 4680212. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:51:23,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:51:28,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 18751488. Throughput: 0: 815.0, 1: 815.0. Samples: 4685105. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:51:28,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:51:29,302][24456] Updated weights for policy 0, policy_version 36640 (0.0017) [2023-10-05 17:51:29,302][24460] Updated weights for policy 1, policy_version 36640 (0.0016) [2023-10-05 17:51:33,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18784256. Throughput: 0: 808.2, 1: 807.8. Samples: 4694327. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:51:33,652][23454] Avg episode reward: [(0, '9.870'), (1, '10.000')] [2023-10-05 17:51:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18817024. Throughput: 0: 809.3, 1: 808.9. Samples: 4704256. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-10-05 17:51:38,652][23454] Avg episode reward: [(0, '9.850'), (1, '10.000')] [2023-10-05 17:51:41,996][24456] Updated weights for policy 0, policy_version 36800 (0.0015) [2023-10-05 17:51:41,996][24460] Updated weights for policy 1, policy_version 36800 (0.0016) [2023-10-05 17:51:43,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18849792. Throughput: 0: 804.2, 1: 804.0. Samples: 4709017. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:51:43,653][23454] Avg episode reward: [(0, '9.850'), (1, '10.000')] [2023-10-05 17:51:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18882560. Throughput: 0: 808.7, 1: 808.3. Samples: 4718897. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:51:48,652][23454] Avg episode reward: [(0, '9.840'), (1, '10.000')] [2023-10-05 17:51:53,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18915328. Throughput: 0: 810.5, 1: 810.0. Samples: 4728832. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:51:53,652][23454] Avg episode reward: [(0, '9.840'), (1, '10.000')] [2023-10-05 17:51:54,505][24460] Updated weights for policy 1, policy_version 36960 (0.0018) [2023-10-05 17:51:54,505][24456] Updated weights for policy 0, policy_version 36960 (0.0018) [2023-10-05 17:51:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18948096. Throughput: 0: 805.5, 1: 804.4. Samples: 4733452. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:51:58,652][23454] Avg episode reward: [(0, '9.840'), (1, '10.000')] [2023-10-05 17:52:03,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 18980864. Throughput: 0: 803.8, 1: 803.0. Samples: 4742975. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:52:03,653][23454] Avg episode reward: [(0, '9.840'), (1, '10.000')] [2023-10-05 17:52:07,351][24460] Updated weights for policy 1, policy_version 37120 (0.0018) [2023-10-05 17:52:07,351][24456] Updated weights for policy 0, policy_version 37120 (0.0017) [2023-10-05 17:52:08,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6498.1). Total num frames: 19013632. Throughput: 0: 807.0, 1: 806.0. Samples: 4752797. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:52:08,652][23454] Avg episode reward: [(0, '9.840'), (1, '10.000')] [2023-10-05 17:52:13,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 19046400. Throughput: 0: 805.3, 1: 805.0. Samples: 4757568. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:52:13,652][23454] Avg episode reward: [(0, '9.840'), (1, '10.000')] [2023-10-05 17:52:18,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6525.8). Total num frames: 19079168. Throughput: 0: 814.4, 1: 814.7. Samples: 4767637. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:52:18,652][23454] Avg episode reward: [(0, '9.810'), (1, '10.000')] [2023-10-05 17:52:19,856][24456] Updated weights for policy 0, policy_version 37280 (0.0017) [2023-10-05 17:52:19,856][24460] Updated weights for policy 1, policy_version 37280 (0.0018) [2023-10-05 17:52:23,651][23454] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6511.9). Total num frames: 19107840. Throughput: 0: 810.3, 1: 811.7. Samples: 4777244. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:52:23,652][23454] Avg episode reward: [(0, '9.800'), (1, '10.000')] [2023-10-05 17:52:28,651][23454] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6484.2). Total num frames: 19136512. Throughput: 0: 811.8, 1: 811.8. Samples: 4782080. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-10-05 17:52:28,652][23454] Avg episode reward: [(0, '9.770'), (1, '10.000')] [2023-10-05 17:52:32,715][24456] Updated weights for policy 0, policy_version 37440 (0.0018) [2023-10-05 17:52:32,715][24460] Updated weights for policy 1, policy_version 37440 (0.0018) [2023-10-05 17:52:33,651][23454] Fps is (10 sec: 6144.1, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19169280. Throughput: 0: 807.4, 1: 806.9. Samples: 4791541. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:52:33,652][23454] Avg episode reward: [(0, '9.770'), (1, '10.000')] [2023-10-05 17:52:38,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19202048. Throughput: 0: 802.6, 1: 803.2. Samples: 4801092. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:52:38,652][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:52:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19234816. Throughput: 0: 808.7, 1: 809.1. Samples: 4806253. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:52:43,652][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:52:45,204][24456] Updated weights for policy 0, policy_version 37600 (0.0019) [2023-10-05 17:52:45,204][24460] Updated weights for policy 1, policy_version 37600 (0.0019) [2023-10-05 17:52:48,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19267584. Throughput: 0: 811.9, 1: 813.0. Samples: 4816097. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:52:48,652][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:52:53,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19300352. Throughput: 0: 806.1, 1: 806.8. Samples: 4825378. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-10-05 17:52:53,652][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:52:58,092][24456] Updated weights for policy 0, policy_version 37760 (0.0016) [2023-10-05 17:52:58,092][24460] Updated weights for policy 1, policy_version 37760 (0.0017) [2023-10-05 17:52:58,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19333120. Throughput: 0: 808.9, 1: 808.6. Samples: 4830359. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:52:58,652][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:53:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19365888. Throughput: 0: 798.5, 1: 798.3. Samples: 4839494. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:53:03,652][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:53:03,661][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000037824_9682944.pth... [2023-10-05 17:53:03,661][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000037824_9682944.pth... [2023-10-05 17:53:03,697][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000034784_8904704.pth [2023-10-05 17:53:03,698][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000034784_8904704.pth [2023-10-05 17:53:08,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6498.1). Total num frames: 19398656. Throughput: 0: 803.2, 1: 802.9. Samples: 4849519. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:53:08,652][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:53:10,916][24456] Updated weights for policy 0, policy_version 37920 (0.0018) [2023-10-05 17:53:10,916][24460] Updated weights for policy 1, policy_version 37920 (0.0017) [2023-10-05 17:53:13,652][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19431424. Throughput: 0: 799.5, 1: 800.0. Samples: 4854060. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-10-05 17:53:13,653][23454] Avg episode reward: [(0, '9.750'), (1, '10.000')] [2023-10-05 17:53:18,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 19464192. Throughput: 0: 805.1, 1: 805.1. Samples: 4864000. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:18,652][23454] Avg episode reward: [(0, '9.740'), (1, '10.000')] [2023-10-05 17:53:23,652][23454] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 6470.3). Total num frames: 19488768. Throughput: 0: 803.9, 1: 804.0. Samples: 4873445. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:23,653][23454] Avg episode reward: [(0, '9.740'), (1, '10.000')] [2023-10-05 17:53:23,695][24456] Updated weights for policy 0, policy_version 38080 (0.0015) [2023-10-05 17:53:23,696][24460] Updated weights for policy 1, policy_version 38080 (0.0016) [2023-10-05 17:53:28,652][23454] Fps is (10 sec: 5734.3, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 19521536. Throughput: 0: 801.1, 1: 800.7. Samples: 4878336. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:28,653][23454] Avg episode reward: [(0, '9.710'), (1, '10.000')] [2023-10-05 17:53:33,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 19554304. Throughput: 0: 802.9, 1: 802.6. Samples: 4888345. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:33,652][23454] Avg episode reward: [(0, '9.670'), (1, '10.000')] [2023-10-05 17:53:36,159][24460] Updated weights for policy 1, policy_version 38240 (0.0018) [2023-10-05 17:53:36,160][24456] Updated weights for policy 0, policy_version 38240 (0.0018) [2023-10-05 17:53:38,651][23454] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 19587072. Throughput: 0: 805.8, 1: 805.8. Samples: 4897900. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:38,652][23454] Avg episode reward: [(0, '9.660'), (1, '10.000')] [2023-10-05 17:53:43,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 19619840. Throughput: 0: 806.2, 1: 806.0. Samples: 4902912. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:43,652][23454] Avg episode reward: [(0, '9.570'), (1, '10.000')] [2023-10-05 17:53:48,651][23454] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 19652608. Throughput: 0: 807.6, 1: 807.5. Samples: 4912174. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:48,652][23454] Avg episode reward: [(0, '9.570'), (1, '10.000')] [2023-10-05 17:53:49,065][24460] Updated weights for policy 1, policy_version 38400 (0.0018) [2023-10-05 17:53:49,065][24456] Updated weights for policy 0, policy_version 38400 (0.0018) [2023-10-05 17:53:53,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 19685376. Throughput: 0: 803.1, 1: 802.4. Samples: 4921766. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:53,652][23454] Avg episode reward: [(0, '9.500'), (1, '10.000')] [2023-10-05 17:53:58,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 19718144. Throughput: 0: 808.6, 1: 808.8. Samples: 4926843. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:53:58,652][23454] Avg episode reward: [(0, '9.490'), (1, '10.000')] [2023-10-05 17:54:01,677][24460] Updated weights for policy 1, policy_version 38560 (0.0018) [2023-10-05 17:54:01,679][24456] Updated weights for policy 0, policy_version 38560 (0.0018) [2023-10-05 17:54:03,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 19750912. Throughput: 0: 804.8, 1: 805.6. Samples: 4936467. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:54:03,652][23454] Avg episode reward: [(0, '9.470'), (1, '10.000')] [2023-10-05 17:54:08,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 19783680. Throughput: 0: 809.0, 1: 808.9. Samples: 4946250. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:54:08,653][23454] Avg episode reward: [(0, '9.470'), (1, '10.000')] [2023-10-05 17:54:13,652][23454] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 19816448. Throughput: 0: 810.8, 1: 811.4. Samples: 4951333. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:54:13,653][23454] Avg episode reward: [(0, '9.460'), (1, '10.000')] [2023-10-05 17:54:14,152][24456] Updated weights for policy 0, policy_version 38720 (0.0019) [2023-10-05 17:54:14,152][24460] Updated weights for policy 1, policy_version 38720 (0.0019) [2023-10-05 17:54:18,652][23454] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 19849216. Throughput: 0: 808.3, 1: 808.3. Samples: 4961093. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:54:18,653][23454] Avg episode reward: [(0, '9.460'), (1, '10.000')] [2023-10-05 17:54:23,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 19881984. Throughput: 0: 807.6, 1: 807.2. Samples: 4970564. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:54:23,652][23454] Avg episode reward: [(0, '9.450'), (1, '10.000')] [2023-10-05 17:54:26,753][24456] Updated weights for policy 0, policy_version 38880 (0.0018) [2023-10-05 17:54:26,753][24460] Updated weights for policy 1, policy_version 38880 (0.0019) [2023-10-05 17:54:28,651][23454] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 19914752. Throughput: 0: 808.2, 1: 809.1. Samples: 4975691. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:54:28,652][23454] Avg episode reward: [(0, '9.450'), (1, '10.000')] [2023-10-05 17:54:33,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 19947520. Throughput: 0: 812.8, 1: 812.9. Samples: 4985330. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-10-05 17:54:33,652][23454] Avg episode reward: [(0, '9.450'), (1, '9.990')] [2023-10-05 17:54:38,651][23454] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 19980288. Throughput: 0: 814.8, 1: 814.8. Samples: 4995102. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-10-05 17:54:38,652][23454] Avg episode reward: [(0, '9.450'), (1, '9.990')] [2023-10-05 17:54:39,274][24460] Updated weights for policy 1, policy_version 39040 (0.0017) [2023-10-05 17:54:39,274][24456] Updated weights for policy 0, policy_version 39040 (0.0013) [2023-10-05 17:54:42,954][24499] Stopping RolloutWorker_w6... [2023-10-05 17:54:42,954][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000039088_10006528.pth... [2023-10-05 17:54:42,954][24178] Stopping Batcher_1... [2023-10-05 17:54:42,954][24494] Stopping RolloutWorker_w2... [2023-10-05 17:54:42,954][24497] Stopping RolloutWorker_w5... [2023-10-05 17:54:42,954][24498] Stopping RolloutWorker_w4... [2023-10-05 17:54:42,954][24496] Stopping RolloutWorker_w3... [2023-10-05 17:54:42,955][24178] Loop batcher_evt_loop terminating... [2023-10-05 17:54:42,954][24493] Stopping RolloutWorker_w1... [2023-10-05 17:54:42,954][24500] Stopping RolloutWorker_w7... [2023-10-05 17:54:42,955][24499] Loop rollout_proc6_evt_loop terminating... [2023-10-05 17:54:42,955][24494] Loop rollout_proc2_evt_loop terminating... [2023-10-05 17:54:42,954][23454] Component RolloutWorker_w6 stopped! [2023-10-05 17:54:42,955][24459] Stopping RolloutWorker_w0... [2023-10-05 17:54:42,955][24497] Loop rollout_proc5_evt_loop terminating... [2023-10-05 17:54:42,955][24498] Loop rollout_proc4_evt_loop terminating... [2023-10-05 17:54:42,955][24500] Loop rollout_proc7_evt_loop terminating... [2023-10-05 17:54:42,955][24496] Loop rollout_proc3_evt_loop terminating... [2023-10-05 17:54:42,955][24493] Loop rollout_proc1_evt_loop terminating... [2023-10-05 17:54:42,955][23454] Component RolloutWorker_w5 stopped! [2023-10-05 17:54:42,955][24459] Loop rollout_proc0_evt_loop terminating... [2023-10-05 17:54:42,956][23454] Component RolloutWorker_w4 stopped! [2023-10-05 17:54:42,957][23454] Component RolloutWorker_w2 stopped! [2023-10-05 17:54:42,957][23454] Component RolloutWorker_w3 stopped! [2023-10-05 17:54:42,958][23454] Component Batcher_0 stopped! [2023-10-05 17:54:42,958][23454] Component RolloutWorker_w1 stopped! [2023-10-05 17:54:42,959][23454] Component Batcher_1 stopped! [2023-10-05 17:54:42,959][23454] Component RolloutWorker_w7 stopped! [2023-10-05 17:54:42,960][23454] Component RolloutWorker_w0 stopped! [2023-10-05 17:54:42,954][24064] Stopping Batcher_0... [2023-10-05 17:54:42,976][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-10-05 17:54:42,975][24064] Loop batcher_evt_loop terminating... [2023-10-05 17:54:42,984][24064] Removing ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000036304_9293824.pth [2023-10-05 17:54:42,989][24064] Saving ./train_atari/atari_bowling/checkpoint_p0/checkpoint_000039088_10006528.pth... [2023-10-05 17:54:43,004][24178] Removing ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000036304_9293824.pth [2023-10-05 17:54:43,008][24178] Saving ./train_atari/atari_bowling/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-10-05 17:54:43,012][24460] Weights refcount: 2 0 [2023-10-05 17:54:43,014][24460] Stopping InferenceWorker_p1-w0... [2023-10-05 17:54:43,014][24460] Loop inference_proc1-0_evt_loop terminating... [2023-10-05 17:54:43,014][23454] Component InferenceWorker_p1-w0 stopped! [2023-10-05 17:54:43,024][24456] Weights refcount: 2 0 [2023-10-05 17:54:43,027][24456] Stopping InferenceWorker_p0-w0... [2023-10-05 17:54:43,028][24456] Loop inference_proc0-0_evt_loop terminating... [2023-10-05 17:54:43,027][23454] Component InferenceWorker_p0-w0 stopped! [2023-10-05 17:54:43,031][24064] Stopping LearnerWorker_p0... [2023-10-05 17:54:43,031][24064] Loop learner_proc0_evt_loop terminating... [2023-10-05 17:54:43,033][23454] Component LearnerWorker_p0 stopped! [2023-10-05 17:54:43,044][24178] Stopping LearnerWorker_p1... [2023-10-05 17:54:43,045][24178] Loop learner_proc1_evt_loop terminating... [2023-10-05 17:54:43,045][23454] Component LearnerWorker_p1 stopped! [2023-10-05 17:54:43,046][23454] Waiting for process learner_proc0 to stop... [2023-10-05 17:54:43,795][23454] Waiting for process learner_proc1 to stop... [2023-10-05 17:54:43,796][23454] Waiting for process inference_proc0-0 to join... [2023-10-05 17:54:43,797][23454] Waiting for process inference_proc1-0 to join... [2023-10-05 17:54:43,798][23454] Waiting for process rollout_proc0 to join... [2023-10-05 17:54:43,798][23454] Waiting for process rollout_proc1 to join... [2023-10-05 17:54:43,799][23454] Waiting for process rollout_proc2 to join... [2023-10-05 17:54:43,800][23454] Waiting for process rollout_proc3 to join... [2023-10-05 17:54:43,800][23454] Waiting for process rollout_proc4 to join... [2023-10-05 17:54:43,801][23454] Waiting for process rollout_proc5 to join... [2023-10-05 17:54:43,802][23454] Waiting for process rollout_proc6 to join... [2023-10-05 17:54:43,802][23454] Waiting for process rollout_proc7 to join... [2023-10-05 17:54:43,803][23454] Batcher 0 profile tree view: batching: 20.5628, releasing_batches: 1.7012 [2023-10-05 17:54:43,803][23454] Batcher 1 profile tree view: batching: 20.6519, releasing_batches: 1.7455 [2023-10-05 17:54:43,804][23454] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 614.9114 update_model: 36.6475 weight_update: 0.0013 one_step: 0.0012 handle_policy_step: 2239.7632 deserialize: 67.3790, stack: 15.9218, obs_to_device_normalize: 546.3019, forward: 1076.7973, send_messages: 92.9225 prepare_outputs: 297.5656 to_cpu: 148.7433 [2023-10-05 17:54:43,804][23454] InferenceWorker_p1-w0 profile tree view: wait_policy: 0.0052 wait_policy_total: 621.0520 update_model: 36.3491 weight_update: 0.0016 one_step: 0.0011 handle_policy_step: 2233.0267 deserialize: 65.8974, stack: 15.5758, obs_to_device_normalize: 547.0974, forward: 1076.3050, send_messages: 93.9574 prepare_outputs: 294.6347 to_cpu: 147.2074 [2023-10-05 17:54:43,804][23454] Learner 0 profile tree view: misc: 0.0155, prepare_batch: 31.9728 train: 456.0648 epoch_init: 0.0882, minibatch_init: 3.1463, losses_postprocess: 61.6094, kl_divergence: 5.5061, after_optimizer: 23.1630 calculate_losses: 44.6268 losses_init: 0.0773, forward_head: 13.6827, bptt_initial: 0.4190, bptt: 0.4367, tail: 10.4284, advantages_returns: 3.0961, losses: 12.8262 update: 313.7543 clip: 162.5697 [2023-10-05 17:54:43,805][23454] Learner 1 profile tree view: misc: 0.0150, prepare_batch: 31.8764 train: 457.7747 epoch_init: 0.0900, minibatch_init: 3.2366, losses_postprocess: 61.2567, kl_divergence: 5.4862, after_optimizer: 23.1678 calculate_losses: 45.4842 losses_init: 0.0716, forward_head: 14.5619, bptt_initial: 0.4254, bptt: 0.4462, tail: 10.4866, advantages_returns: 3.1052, losses: 12.7949 update: 314.9894 clip: 164.6309 [2023-10-05 17:54:43,805][23454] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3975, enqueue_policy_requests: 42.6381, env_step: 1003.1330, overhead: 28.1995, complete_rollouts: 1.0925 save_policy_outputs: 54.8173 split_output_tensors: 19.0835 [2023-10-05 17:54:43,805][23454] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4092, enqueue_policy_requests: 42.7551, env_step: 968.8520, overhead: 28.1730, complete_rollouts: 1.0829 save_policy_outputs: 54.7264 split_output_tensors: 18.9349 [2023-10-05 17:54:43,806][23454] Loop Runner_EvtLoop terminating... [2023-10-05 17:54:43,806][23454] Runner profile tree view: main_loop: 3095.9497 [2023-10-05 17:54:43,806][23454] Collected {0: 10006528, 1: 10006528}, FPS: 6464.3