[2023-09-22 13:08:04,739][126655] Saving configuration to ./train_atari/Asterix/config.json... [2023-09-22 13:08:05,005][126655] Rollout worker 0 uses device cpu [2023-09-22 13:08:05,006][126655] Rollout worker 1 uses device cpu [2023-09-22 13:08:05,006][126655] Rollout worker 2 uses device cpu [2023-09-22 13:08:05,006][126655] Rollout worker 3 uses device cpu [2023-09-22 13:08:05,006][126655] Rollout worker 4 uses device cpu [2023-09-22 13:08:05,007][126655] Rollout worker 5 uses device cpu [2023-09-22 13:08:05,007][126655] Rollout worker 6 uses device cpu [2023-09-22 13:08:05,007][126655] Rollout worker 7 uses device cpu [2023-09-22 13:08:05,007][126655] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-22 13:08:05,061][126655] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:08:05,061][126655] InferenceWorker_p0-w0: min num requests: 2 [2023-09-22 13:08:05,085][126655] Starting all processes... [2023-09-22 13:08:05,086][126655] Starting process learner_proc0 [2023-09-22 13:08:06,741][126655] Starting all processes... [2023-09-22 13:08:06,744][127055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:08:06,744][127055] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-22 13:08:06,748][126655] Starting process inference_proc0-0 [2023-09-22 13:08:06,749][126655] Starting process rollout_proc0 [2023-09-22 13:08:06,749][126655] Starting process rollout_proc1 [2023-09-22 13:08:06,749][126655] Starting process rollout_proc2 [2023-09-22 13:08:06,753][126655] Starting process rollout_proc3 [2023-09-22 13:08:06,754][126655] Starting process rollout_proc4 [2023-09-22 13:08:06,755][126655] Starting process rollout_proc5 [2023-09-22 13:08:06,758][126655] Starting process rollout_proc6 [2023-09-22 13:08:06,761][126655] Starting process rollout_proc7 [2023-09-22 13:08:06,782][127055] Num visible devices: 1 [2023-09-22 13:08:06,824][127055] Starting seed is not provided [2023-09-22 13:08:06,824][127055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:08:06,825][127055] Initializing actor-critic model on device cuda:0 [2023-09-22 13:08:06,825][127055] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 13:08:06,826][127055] RunningMeanStd input shape: (1,) [2023-09-22 13:08:06,845][127055] ConvEncoder: input_channels=4 [2023-09-22 13:08:07,201][127055] Conv encoder output size: 512 [2023-09-22 13:08:07,203][127055] Created Actor Critic model with architecture: [2023-09-22 13:08:07,203][127055] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=9, bias=True) ) ) [2023-09-22 13:08:07,786][127055] Using optimizer [2023-09-22 13:08:07,787][127055] No checkpoints found [2023-09-22 13:08:07,788][127055] Did not load from checkpoint, starting from scratch! [2023-09-22 13:08:07,788][127055] Initialized policy 0 weights for model version 0 [2023-09-22 13:08:07,791][127055] LearnerWorker_p0 finished initialization! [2023-09-22 13:08:07,792][127055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:08:08,634][127232] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-22 13:08:08,647][127226] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-22 13:08:08,673][127230] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-22 13:08:08,674][127231] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-22 13:08:08,681][127229] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-22 13:08:08,685][127225] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:08:08,685][127225] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-22 13:08:08,686][127233] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-22 13:08:08,706][127225] Num visible devices: 1 [2023-09-22 13:08:08,719][127228] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-22 13:08:08,720][127227] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-22 13:08:09,401][127225] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 13:08:09,401][127225] RunningMeanStd input shape: (1,) [2023-09-22 13:08:09,413][127225] ConvEncoder: input_channels=4 [2023-09-22 13:08:09,515][127225] Conv encoder output size: 512 [2023-09-22 13:08:09,521][126655] Inference worker 0-0 is ready! [2023-09-22 13:08:09,522][126655] All inference workers are ready! Signal rollout workers to start! [2023-09-22 13:08:09,971][127226] Decorrelating experience for 0 frames... [2023-09-22 13:08:09,979][127228] Decorrelating experience for 0 frames... [2023-09-22 13:08:09,981][127232] Decorrelating experience for 0 frames... [2023-09-22 13:08:09,981][127227] Decorrelating experience for 0 frames... [2023-09-22 13:08:09,984][127229] Decorrelating experience for 0 frames... [2023-09-22 13:08:09,985][127230] Decorrelating experience for 0 frames... [2023-09-22 13:08:09,988][127233] Decorrelating experience for 0 frames... [2023-09-22 13:08:09,989][127231] Decorrelating experience for 0 frames... [2023-09-22 13:08:10,740][126655] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-22 13:08:15,740][126655] Fps is (10 sec: 1638.3, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 8192. Throughput: 0: 614.4. Samples: 3072. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:08:15,741][126655] Avg episode reward: [(0, '1.719')] [2023-09-22 13:08:20,740][126655] Fps is (10 sec: 3276.7, 60 sec: 3276.7, 300 sec: 3276.7). Total num frames: 32768. Throughput: 0: 615.4. Samples: 6154. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:08:20,741][126655] Avg episode reward: [(0, '1.420')] [2023-09-22 13:08:21,805][127225] Updated weights for policy 0, policy_version 160 (0.0018) [2023-09-22 13:08:25,055][126655] Heartbeat connected on Batcher_0 [2023-09-22 13:08:25,065][126655] Heartbeat connected on RolloutWorker_w0 [2023-09-22 13:08:25,067][126655] Heartbeat connected on RolloutWorker_w1 [2023-09-22 13:08:25,070][126655] Heartbeat connected on RolloutWorker_w2 [2023-09-22 13:08:25,073][126655] Heartbeat connected on RolloutWorker_w3 [2023-09-22 13:08:25,075][126655] Heartbeat connected on RolloutWorker_w4 [2023-09-22 13:08:25,078][126655] Heartbeat connected on RolloutWorker_w5 [2023-09-22 13:08:25,081][126655] Heartbeat connected on RolloutWorker_w6 [2023-09-22 13:08:25,084][126655] Heartbeat connected on RolloutWorker_w7 [2023-09-22 13:08:25,098][126655] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-22 13:08:25,187][126655] Heartbeat connected on LearnerWorker_p0 [2023-09-22 13:08:25,740][126655] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3822.9). Total num frames: 57344. Throughput: 0: 915.4. Samples: 13731. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:08:25,741][126655] Avg episode reward: [(0, '1.370')] [2023-09-22 13:08:30,146][127225] Updated weights for policy 0, policy_version 320 (0.0016) [2023-09-22 13:08:30,740][126655] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 81920. Throughput: 0: 1058.7. Samples: 21174. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:08:30,741][126655] Avg episode reward: [(0, '1.560')] [2023-09-22 13:08:35,740][126655] Fps is (10 sec: 4915.2, 60 sec: 4259.8, 300 sec: 4259.8). Total num frames: 106496. Throughput: 0: 983.0. Samples: 24576. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:08:35,740][126655] Avg episode reward: [(0, '1.550')] [2023-09-22 13:08:35,741][127055] Saving new best policy, reward=1.550! [2023-09-22 13:08:38,581][127225] Updated weights for policy 0, policy_version 480 (0.0017) [2023-09-22 13:08:40,740][126655] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 4369.1). Total num frames: 131072. Throughput: 0: 1058.1. Samples: 31744. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:08:40,741][126655] Avg episode reward: [(0, '1.490')] [2023-09-22 13:08:42,835][126655] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 126655], exiting... [2023-09-22 13:08:42,836][126655] Runner profile tree view: main_loop: 37.7510 [2023-09-22 13:08:42,836][127226] Stopping RolloutWorker_w1... [2023-09-22 13:08:42,836][127231] Stopping RolloutWorker_w6... [2023-09-22 13:08:42,836][127230] Stopping RolloutWorker_w4... [2023-09-22 13:08:42,837][126655] Collected {0: 139264}, FPS: 3689.0 [2023-09-22 13:08:42,836][127228] Stopping RolloutWorker_w3... [2023-09-22 13:08:42,836][127227] Stopping RolloutWorker_w2... [2023-09-22 13:08:42,836][127232] Stopping RolloutWorker_w5... [2023-09-22 13:08:42,836][127229] Stopping RolloutWorker_w0... [2023-09-22 13:08:42,837][127226] Loop rollout_proc1_evt_loop terminating... [2023-09-22 13:08:42,837][127233] Stopping RolloutWorker_w7... [2023-09-22 13:08:42,837][127231] Loop rollout_proc6_evt_loop terminating... [2023-09-22 13:08:42,837][127230] Loop rollout_proc4_evt_loop terminating... [2023-09-22 13:08:42,837][127055] Stopping Batcher_0... [2023-09-22 13:08:42,837][127227] Loop rollout_proc2_evt_loop terminating... [2023-09-22 13:08:42,837][127229] Loop rollout_proc0_evt_loop terminating... [2023-09-22 13:08:42,837][127228] Loop rollout_proc3_evt_loop terminating... [2023-09-22 13:08:42,837][127232] Loop rollout_proc5_evt_loop terminating... [2023-09-22 13:08:42,837][127233] Loop rollout_proc7_evt_loop terminating... [2023-09-22 13:08:42,838][127055] Loop batcher_evt_loop terminating... [2023-09-22 13:08:42,903][127225] Weights refcount: 2 0 [2023-09-22 13:08:42,905][127225] Stopping InferenceWorker_p0-w0... [2023-09-22 13:08:42,905][127225] Loop inference_proc0-0_evt_loop terminating... [2023-09-22 13:08:43,016][127055] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000000560_143360.pth... [2023-09-22 13:08:43,044][127055] Stopping LearnerWorker_p0... [2023-09-22 13:08:43,044][127055] Loop learner_proc0_evt_loop terminating... [2023-09-22 13:09:45,112][05066] Saving configuration to ./train_atari/Asterix/config.json... [2023-09-22 13:09:45,436][05066] Rollout worker 0 uses device cpu [2023-09-22 13:09:45,437][05066] Rollout worker 1 uses device cpu [2023-09-22 13:09:45,438][05066] Rollout worker 2 uses device cpu [2023-09-22 13:09:45,439][05066] Rollout worker 3 uses device cpu [2023-09-22 13:09:45,440][05066] Rollout worker 4 uses device cpu [2023-09-22 13:09:45,440][05066] Rollout worker 5 uses device cpu [2023-09-22 13:09:45,441][05066] Rollout worker 6 uses device cpu [2023-09-22 13:09:45,442][05066] Rollout worker 7 uses device cpu [2023-09-22 13:09:45,442][05066] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-22 13:09:45,509][05066] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:09:45,509][05066] InferenceWorker_p0-w0: min num requests: 1 [2023-09-22 13:09:45,512][05066] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-22 13:09:45,513][05066] InferenceWorker_p1-w0: min num requests: 1 [2023-09-22 13:09:45,537][05066] Starting all processes... [2023-09-22 13:09:45,537][05066] Starting process learner_proc0 [2023-09-22 13:09:47,231][05066] Starting process learner_proc1 [2023-09-22 13:09:47,235][06078] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:09:47,235][06078] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-22 13:09:47,276][06078] Num visible devices: 1 [2023-09-22 13:09:47,305][06078] Starting seed is not provided [2023-09-22 13:09:47,305][06078] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:09:47,305][06078] Initializing actor-critic model on device cuda:0 [2023-09-22 13:09:47,305][06078] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 13:09:47,306][06078] RunningMeanStd input shape: (1,) [2023-09-22 13:09:47,325][06078] ConvEncoder: input_channels=4 [2023-09-22 13:09:47,489][06078] Conv encoder output size: 512 [2023-09-22 13:09:47,490][06078] Created Actor Critic model with architecture: [2023-09-22 13:09:47,491][06078] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=9, bias=True) ) ) [2023-09-22 13:09:48,046][06078] Using optimizer [2023-09-22 13:09:48,046][06078] Loading state from checkpoint ./train_atari/Asterix/checkpoint_p0/checkpoint_000000560_143360.pth... [2023-09-22 13:09:48,063][06078] Loading model from checkpoint [2023-09-22 13:09:48,066][06078] Loaded experiment state at self.train_step=560, self.env_steps=143360 [2023-09-22 13:09:48,066][06078] Initialized policy 0 weights for model version 560 [2023-09-22 13:09:48,068][06078] LearnerWorker_p0 finished initialization! [2023-09-22 13:09:48,068][06078] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:09:48,870][05066] Starting all processes... [2023-09-22 13:09:48,873][06278] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-22 13:09:48,873][06278] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-09-22 13:09:48,876][05066] Starting process inference_proc0-0 [2023-09-22 13:09:48,876][05066] Starting process inference_proc1-0 [2023-09-22 13:09:48,876][05066] Starting process rollout_proc0 [2023-09-22 13:09:48,911][06278] Num visible devices: 1 [2023-09-22 13:09:48,876][05066] Starting process rollout_proc1 [2023-09-22 13:09:48,877][05066] Starting process rollout_proc2 [2023-09-22 13:09:48,951][06278] Starting seed is not provided [2023-09-22 13:09:48,951][06278] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-22 13:09:48,951][06278] Initializing actor-critic model on device cuda:0 [2023-09-22 13:09:48,952][06278] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 13:09:48,953][06278] RunningMeanStd input shape: (1,) [2023-09-22 13:09:48,877][05066] Starting process rollout_proc3 [2023-09-22 13:09:48,879][05066] Starting process rollout_proc4 [2023-09-22 13:09:48,880][05066] Starting process rollout_proc5 [2023-09-22 13:09:48,882][05066] Starting process rollout_proc6 [2023-09-22 13:09:48,932][05066] Starting process rollout_proc7 [2023-09-22 13:09:48,972][06278] ConvEncoder: input_channels=4 [2023-09-22 13:09:49,256][06278] Conv encoder output size: 512 [2023-09-22 13:09:49,259][06278] Created Actor Critic model with architecture: [2023-09-22 13:09:49,259][06278] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=9, bias=True) ) ) [2023-09-22 13:09:49,851][06278] Using optimizer [2023-09-22 13:09:49,852][06278] No checkpoints found [2023-09-22 13:09:49,852][06278] Did not load from checkpoint, starting from scratch! [2023-09-22 13:09:49,852][06278] Initialized policy 1 weights for model version 0 [2023-09-22 13:09:49,854][06278] LearnerWorker_p1 finished initialization! [2023-09-22 13:09:49,854][06278] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-22 13:09:50,874][06607] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-22 13:09:50,886][06493] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-22 13:09:50,887][06493] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-09-22 13:09:50,929][06493] Num visible devices: 1 [2023-09-22 13:09:50,947][06567] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 13:09:50,947][06567] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-22 13:09:50,988][06567] Num visible devices: 1 [2023-09-22 13:09:51,028][06601] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-22 13:09:51,067][06599] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-22 13:09:51,068][06600] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-22 13:09:51,075][06609] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-22 13:09:51,088][06571] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-22 13:09:51,159][06611] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-22 13:09:51,189][05066] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 143360. Throughput: 0: nan, 1: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-22 13:09:51,192][06590] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-22 13:09:51,559][06493] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 13:09:51,560][06493] RunningMeanStd input shape: (1,) [2023-09-22 13:09:51,571][06493] ConvEncoder: input_channels=4 [2023-09-22 13:09:51,599][06567] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 13:09:51,599][06567] RunningMeanStd input shape: (1,) [2023-09-22 13:09:51,611][06567] ConvEncoder: input_channels=4 [2023-09-22 13:09:51,686][06493] Conv encoder output size: 512 [2023-09-22 13:09:51,692][05066] Inference worker 1-0 is ready! [2023-09-22 13:09:51,719][06567] Conv encoder output size: 512 [2023-09-22 13:09:51,725][05066] Inference worker 0-0 is ready! [2023-09-22 13:09:51,725][05066] All inference workers are ready! Signal rollout workers to start! [2023-09-22 13:09:52,196][06609] Decorrelating experience for 0 frames... [2023-09-22 13:09:52,207][06600] Decorrelating experience for 0 frames... [2023-09-22 13:09:52,210][06611] Decorrelating experience for 0 frames... [2023-09-22 13:09:52,262][06590] Decorrelating experience for 0 frames... [2023-09-22 13:09:52,294][06607] Decorrelating experience for 0 frames... [2023-09-22 13:09:52,317][06601] Decorrelating experience for 0 frames... [2023-09-22 13:09:52,333][06571] Decorrelating experience for 0 frames... [2023-09-22 13:09:52,444][06599] Decorrelating experience for 0 frames... [2023-09-22 13:09:56,154][05066] Fps is (10 sec: 1650.2, 60 sec: 1650.2, 300 sec: 1650.2). Total num frames: 151552. Throughput: 0: 206.3, 1: 206.3. Samples: 2048. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:09:56,154][05066] Avg episode reward: [(0, '2.000'), (1, '1.556')] [2023-09-22 13:09:56,157][06078] Saving new best policy, reward=2.000! [2023-09-22 13:10:01,154][05066] Fps is (10 sec: 2466.3, 60 sec: 2466.3, 300 sec: 2466.3). Total num frames: 167936. Throughput: 0: 387.5, 1: 384.1. Samples: 7688. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:10:01,155][05066] Avg episode reward: [(0, '1.792'), (1, '1.479')] [2023-09-22 13:10:05,496][05066] Heartbeat connected on Batcher_0 [2023-09-22 13:10:05,499][05066] Heartbeat connected on LearnerWorker_p0 [2023-09-22 13:10:05,502][05066] Heartbeat connected on Batcher_1 [2023-09-22 13:10:05,504][05066] Heartbeat connected on LearnerWorker_p1 [2023-09-22 13:10:05,511][05066] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-22 13:10:05,515][05066] Heartbeat connected on InferenceWorker_p1-w0 [2023-09-22 13:10:05,517][05066] Heartbeat connected on RolloutWorker_w0 [2023-09-22 13:10:05,519][05066] Heartbeat connected on RolloutWorker_w1 [2023-09-22 13:10:05,522][05066] Heartbeat connected on RolloutWorker_w2 [2023-09-22 13:10:05,525][05066] Heartbeat connected on RolloutWorker_w3 [2023-09-22 13:10:05,528][05066] Heartbeat connected on RolloutWorker_w4 [2023-09-22 13:10:05,530][05066] Heartbeat connected on RolloutWorker_w5 [2023-09-22 13:10:05,534][05066] Heartbeat connected on RolloutWorker_w6 [2023-09-22 13:10:05,536][05066] Heartbeat connected on RolloutWorker_w7 [2023-09-22 13:10:06,154][05066] Fps is (10 sec: 4915.0, 60 sec: 3831.9, 300 sec: 3831.9). Total num frames: 200704. Throughput: 0: 410.6, 1: 410.6. Samples: 12288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:10:06,155][05066] Avg episode reward: [(0, '1.690'), (1, '1.414')] [2023-09-22 13:10:09,106][06567] Updated weights for policy 0, policy_version 720 (0.0017) [2023-09-22 13:10:09,107][06493] Updated weights for policy 1, policy_version 160 (0.0017) [2023-09-22 13:10:11,154][05066] Fps is (10 sec: 6553.7, 60 sec: 4513.6, 300 sec: 4513.6). Total num frames: 233472. Throughput: 0: 542.1, 1: 541.4. Samples: 21631. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:10:11,154][05066] Avg episode reward: [(0, '1.550'), (1, '1.250')] [2023-09-22 13:10:16,154][05066] Fps is (10 sec: 6553.7, 60 sec: 4922.2, 300 sec: 4922.2). Total num frames: 266240. Throughput: 0: 621.1, 1: 620.0. Samples: 30983. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:16,155][05066] Avg episode reward: [(0, '1.570'), (1, '1.490')] [2023-09-22 13:10:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 5194.4, 300 sec: 5194.4). Total num frames: 299008. Throughput: 0: 596.2, 1: 595.0. Samples: 35693. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:21,155][05066] Avg episode reward: [(0, '1.690'), (1, '1.700')] [2023-09-22 13:10:22,191][06493] Updated weights for policy 1, policy_version 320 (0.0014) [2023-09-22 13:10:22,192][06567] Updated weights for policy 0, policy_version 880 (0.0015) [2023-09-22 13:10:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 5388.8, 300 sec: 5388.8). Total num frames: 331776. Throughput: 0: 644.3, 1: 644.3. Samples: 45057. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:10:26,155][05066] Avg episode reward: [(0, '1.620'), (1, '1.870')] [2023-09-22 13:10:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 5329.5, 300 sec: 5329.5). Total num frames: 356352. Throughput: 0: 684.4, 1: 683.2. Samples: 54657. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:31,155][05066] Avg episode reward: [(0, '1.790'), (1, '1.750')] [2023-09-22 13:10:31,304][06278] Saving new best policy, reward=1.750! [2023-09-22 13:10:35,240][06567] Updated weights for policy 0, policy_version 1040 (0.0016) [2023-09-22 13:10:35,241][06493] Updated weights for policy 1, policy_version 480 (0.0019) [2023-09-22 13:10:36,154][05066] Fps is (10 sec: 5734.5, 60 sec: 5465.7, 300 sec: 5465.7). Total num frames: 389120. Throughput: 0: 660.4, 1: 660.4. Samples: 59390. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:36,154][05066] Avg episode reward: [(0, '1.840'), (1, '1.620')] [2023-09-22 13:10:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 5574.5, 300 sec: 5574.5). Total num frames: 421888. Throughput: 0: 737.2, 1: 736.3. Samples: 68358. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:41,154][05066] Avg episode reward: [(0, '1.600'), (1, '1.530')] [2023-09-22 13:10:46,154][05066] Fps is (10 sec: 6553.7, 60 sec: 5663.6, 300 sec: 5663.6). Total num frames: 454656. Throughput: 0: 778.9, 1: 778.8. Samples: 77787. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:46,154][05066] Avg episode reward: [(0, '1.780'), (1, '1.780')] [2023-09-22 13:10:46,155][06278] Saving new best policy, reward=1.780! [2023-09-22 13:10:48,505][06567] Updated weights for policy 0, policy_version 1200 (0.0015) [2023-09-22 13:10:48,506][06493] Updated weights for policy 1, policy_version 640 (0.0017) [2023-09-22 13:10:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 5601.2, 300 sec: 5601.2). Total num frames: 479232. Throughput: 0: 778.3, 1: 777.2. Samples: 82285. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:51,154][05066] Avg episode reward: [(0, '1.930'), (1, '1.600')] [2023-09-22 13:10:56,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6007.4, 300 sec: 5674.5). Total num frames: 512000. Throughput: 0: 778.9, 1: 778.4. Samples: 91710. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:10:56,155][05066] Avg episode reward: [(0, '1.830'), (1, '1.490')] [2023-09-22 13:11:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 5737.3). Total num frames: 544768. Throughput: 0: 776.8, 1: 776.7. Samples: 100892. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:11:01,154][05066] Avg episode reward: [(0, '1.840'), (1, '1.540')] [2023-09-22 13:11:01,759][06493] Updated weights for policy 1, policy_version 800 (0.0014) [2023-09-22 13:11:01,759][06567] Updated weights for policy 0, policy_version 1360 (0.0018) [2023-09-22 13:11:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 5791.8). Total num frames: 577536. Throughput: 0: 776.1, 1: 776.4. Samples: 105553. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:11:06,154][05066] Avg episode reward: [(0, '1.880'), (1, '1.320')] [2023-09-22 13:11:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 5839.4). Total num frames: 610304. Throughput: 0: 773.7, 1: 773.7. Samples: 114688. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:11:11,155][05066] Avg episode reward: [(0, '2.050'), (1, '1.500')] [2023-09-22 13:11:11,166][06078] Saving new best policy, reward=2.050! [2023-09-22 13:11:15,167][06567] Updated weights for policy 0, policy_version 1520 (0.0015) [2023-09-22 13:11:15,168][06493] Updated weights for policy 1, policy_version 960 (0.0015) [2023-09-22 13:11:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 5785.0). Total num frames: 634880. Throughput: 0: 768.8, 1: 769.0. Samples: 123861. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:11:16,154][05066] Avg episode reward: [(0, '1.940'), (1, '1.850')] [2023-09-22 13:11:16,155][06278] Saving new best policy, reward=1.850! [2023-09-22 13:11:21,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 5827.7). Total num frames: 667648. Throughput: 0: 767.8, 1: 765.4. Samples: 128386. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:11:21,154][05066] Avg episode reward: [(0, '1.730'), (1, '1.830')] [2023-09-22 13:11:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 5865.9). Total num frames: 700416. Throughput: 0: 769.3, 1: 769.4. Samples: 137601. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:11:26,154][05066] Avg episode reward: [(0, '1.920'), (1, '1.840')] [2023-09-22 13:11:28,417][06567] Updated weights for policy 0, policy_version 1680 (0.0018) [2023-09-22 13:11:28,417][06493] Updated weights for policy 1, policy_version 1120 (0.0019) [2023-09-22 13:11:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 5900.3). Total num frames: 733184. Throughput: 0: 772.6, 1: 772.3. Samples: 147309. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:11:31,155][05066] Avg episode reward: [(0, '2.030'), (1, '1.740')] [2023-09-22 13:11:36,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 5853.4). Total num frames: 757760. Throughput: 0: 770.6, 1: 770.8. Samples: 151647. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:11:36,155][05066] Avg episode reward: [(0, '1.930'), (1, '1.510')] [2023-09-22 13:11:41,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 5885.2). Total num frames: 790528. Throughput: 0: 773.0, 1: 772.9. Samples: 161276. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:11:41,155][05066] Avg episode reward: [(0, '2.030'), (1, '1.830')] [2023-09-22 13:11:41,167][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000001264_323584.pth... [2023-09-22 13:11:41,168][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000001824_466944.pth... [2023-09-22 13:11:41,652][06493] Updated weights for policy 1, policy_version 1280 (0.0017) [2023-09-22 13:11:41,652][06567] Updated weights for policy 0, policy_version 1840 (0.0017) [2023-09-22 13:11:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 5914.3). Total num frames: 823296. Throughput: 0: 769.4, 1: 769.3. Samples: 170135. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:11:46,155][05066] Avg episode reward: [(0, '2.170'), (1, '1.990')] [2023-09-22 13:11:46,156][06078] Saving new best policy, reward=2.170! [2023-09-22 13:11:46,156][06278] Saving new best policy, reward=1.990! [2023-09-22 13:11:51,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 5940.9). Total num frames: 856064. Throughput: 0: 772.4, 1: 772.2. Samples: 175060. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:11:51,155][05066] Avg episode reward: [(0, '2.160'), (1, '2.380')] [2023-09-22 13:11:51,156][06278] Saving new best policy, reward=2.380! [2023-09-22 13:11:54,921][06493] Updated weights for policy 1, policy_version 1440 (0.0015) [2023-09-22 13:11:54,921][06567] Updated weights for policy 0, policy_version 2000 (0.0016) [2023-09-22 13:11:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 5899.9). Total num frames: 880640. Throughput: 0: 773.7, 1: 773.6. Samples: 184318. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:11:56,155][05066] Avg episode reward: [(0, '2.290'), (1, '2.410')] [2023-09-22 13:11:56,207][06078] Saving new best policy, reward=2.290! [2023-09-22 13:11:56,234][06278] Saving new best policy, reward=2.410! [2023-09-22 13:12:01,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 5925.1). Total num frames: 913408. Throughput: 0: 776.6, 1: 776.2. Samples: 193737. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:12:01,155][05066] Avg episode reward: [(0, '2.470'), (1, '2.230')] [2023-09-22 13:12:01,156][06078] Saving new best policy, reward=2.470! [2023-09-22 13:12:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 5948.3). Total num frames: 946176. Throughput: 0: 778.8, 1: 780.5. Samples: 198552. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:12:06,155][05066] Avg episode reward: [(0, '2.220'), (1, '1.930')] [2023-09-22 13:12:07,884][06567] Updated weights for policy 0, policy_version 2160 (0.0015) [2023-09-22 13:12:07,885][06493] Updated weights for policy 1, policy_version 1600 (0.0018) [2023-09-22 13:12:11,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 5970.0). Total num frames: 978944. Throughput: 0: 778.6, 1: 779.0. Samples: 207691. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:12:11,154][05066] Avg episode reward: [(0, '2.310'), (1, '2.090')] [2023-09-22 13:12:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 5990.1). Total num frames: 1011712. Throughput: 0: 774.8, 1: 775.9. Samples: 217089. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:12:16,155][05066] Avg episode reward: [(0, '2.190'), (1, '2.360')] [2023-09-22 13:12:20,913][06493] Updated weights for policy 1, policy_version 1760 (0.0016) [2023-09-22 13:12:20,914][06567] Updated weights for policy 0, policy_version 2320 (0.0017) [2023-09-22 13:12:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6008.9). Total num frames: 1044480. Throughput: 0: 780.0, 1: 780.9. Samples: 221884. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:12:21,154][05066] Avg episode reward: [(0, '2.590'), (1, '2.620')] [2023-09-22 13:12:21,155][06078] Saving new best policy, reward=2.590! [2023-09-22 13:12:21,155][06278] Saving new best policy, reward=2.620! [2023-09-22 13:12:26,155][05066] Fps is (10 sec: 6143.4, 60 sec: 6212.2, 300 sec: 6000.0). Total num frames: 1073152. Throughput: 0: 779.0, 1: 779.9. Samples: 231424. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:12:26,157][05066] Avg episode reward: [(0, '2.510'), (1, '2.630')] [2023-09-22 13:12:26,169][06278] Saving new best policy, reward=2.630! [2023-09-22 13:12:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 5991.7). Total num frames: 1101824. Throughput: 0: 783.8, 1: 783.9. Samples: 240683. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:12:31,154][05066] Avg episode reward: [(0, '2.740'), (1, '2.950')] [2023-09-22 13:12:31,155][06078] Saving new best policy, reward=2.740! [2023-09-22 13:12:31,155][06278] Saving new best policy, reward=2.950! [2023-09-22 13:12:34,058][06567] Updated weights for policy 0, policy_version 2480 (0.0017) [2023-09-22 13:12:34,058][06493] Updated weights for policy 1, policy_version 1920 (0.0017) [2023-09-22 13:12:36,154][05066] Fps is (10 sec: 6144.8, 60 sec: 6280.6, 300 sec: 6008.8). Total num frames: 1134592. Throughput: 0: 783.2, 1: 783.3. Samples: 245552. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:12:36,154][05066] Avg episode reward: [(0, '2.680'), (1, '2.890')] [2023-09-22 13:12:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6024.8). Total num frames: 1167360. Throughput: 0: 784.5, 1: 784.4. Samples: 254918. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:12:41,154][05066] Avg episode reward: [(0, '2.570'), (1, '3.200')] [2023-09-22 13:12:41,164][06278] Saving new best policy, reward=3.200! [2023-09-22 13:12:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6039.9). Total num frames: 1200128. Throughput: 0: 782.4, 1: 783.4. Samples: 264196. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:12:46,154][05066] Avg episode reward: [(0, '2.720'), (1, '3.260')] [2023-09-22 13:12:46,155][06278] Saving new best policy, reward=3.260! [2023-09-22 13:12:47,147][06493] Updated weights for policy 1, policy_version 2080 (0.0016) [2023-09-22 13:12:47,148][06567] Updated weights for policy 0, policy_version 2640 (0.0015) [2023-09-22 13:12:51,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6008.6). Total num frames: 1224704. Throughput: 0: 780.3, 1: 780.2. Samples: 268775. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:12:51,155][05066] Avg episode reward: [(0, '3.050'), (1, '3.340')] [2023-09-22 13:12:51,169][06078] Saving new best policy, reward=3.050! [2023-09-22 13:12:51,173][06278] Saving new best policy, reward=3.340! [2023-09-22 13:12:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6023.4). Total num frames: 1257472. Throughput: 0: 783.0, 1: 784.2. Samples: 278213. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:12:56,154][05066] Avg episode reward: [(0, '2.750'), (1, '3.450')] [2023-09-22 13:12:56,163][06278] Saving new best policy, reward=3.450! [2023-09-22 13:13:00,470][06567] Updated weights for policy 0, policy_version 2800 (0.0017) [2023-09-22 13:13:00,470][06493] Updated weights for policy 1, policy_version 2240 (0.0017) [2023-09-22 13:13:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6037.3). Total num frames: 1290240. Throughput: 0: 779.3, 1: 778.3. Samples: 287180. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:13:01,155][05066] Avg episode reward: [(0, '3.000'), (1, '3.630')] [2023-09-22 13:13:01,156][06278] Saving new best policy, reward=3.630! [2023-09-22 13:13:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6050.6). Total num frames: 1323008. Throughput: 0: 778.6, 1: 777.7. Samples: 291920. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:13:06,155][05066] Avg episode reward: [(0, '3.060'), (1, '3.620')] [2023-09-22 13:13:06,156][06078] Saving new best policy, reward=3.060! [2023-09-22 13:13:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6063.2). Total num frames: 1355776. Throughput: 0: 773.8, 1: 773.7. Samples: 301062. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:13:11,155][05066] Avg episode reward: [(0, '3.430'), (1, '3.660')] [2023-09-22 13:13:11,166][06078] Saving new best policy, reward=3.430! [2023-09-22 13:13:11,166][06278] Saving new best policy, reward=3.660! [2023-09-22 13:13:13,794][06567] Updated weights for policy 0, policy_version 2960 (0.0016) [2023-09-22 13:13:13,795][06493] Updated weights for policy 1, policy_version 2400 (0.0017) [2023-09-22 13:13:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6035.2). Total num frames: 1380352. Throughput: 0: 775.5, 1: 775.7. Samples: 310485. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:13:16,155][05066] Avg episode reward: [(0, '3.280'), (1, '4.040')] [2023-09-22 13:13:16,355][06278] Saving new best policy, reward=4.040! [2023-09-22 13:13:21,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6047.5). Total num frames: 1413120. Throughput: 0: 775.1, 1: 775.0. Samples: 315309. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:13:21,154][05066] Avg episode reward: [(0, '3.440'), (1, '4.150')] [2023-09-22 13:13:21,155][06078] Saving new best policy, reward=3.440! [2023-09-22 13:13:21,155][06278] Saving new best policy, reward=4.150! [2023-09-22 13:13:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6212.4, 300 sec: 6059.3). Total num frames: 1445888. Throughput: 0: 775.6, 1: 775.4. Samples: 324714. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:13:26,155][05066] Avg episode reward: [(0, '3.490'), (1, '3.950')] [2023-09-22 13:13:26,163][06078] Saving new best policy, reward=3.490! [2023-09-22 13:13:26,764][06493] Updated weights for policy 1, policy_version 2560 (0.0017) [2023-09-22 13:13:26,764][06567] Updated weights for policy 0, policy_version 3120 (0.0018) [2023-09-22 13:13:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6070.5). Total num frames: 1478656. Throughput: 0: 776.1, 1: 775.3. Samples: 334008. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:13:31,154][05066] Avg episode reward: [(0, '3.500'), (1, '3.490')] [2023-09-22 13:13:31,155][06078] Saving new best policy, reward=3.500! [2023-09-22 13:13:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6081.2). Total num frames: 1511424. Throughput: 0: 778.7, 1: 779.4. Samples: 338888. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:13:36,154][05066] Avg episode reward: [(0, '3.690'), (1, '3.410')] [2023-09-22 13:13:36,155][06078] Saving new best policy, reward=3.690! [2023-09-22 13:13:39,664][06567] Updated weights for policy 0, policy_version 3280 (0.0018) [2023-09-22 13:13:39,664][06493] Updated weights for policy 1, policy_version 2720 (0.0016) [2023-09-22 13:13:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6091.5). Total num frames: 1544192. Throughput: 0: 778.8, 1: 777.2. Samples: 348233. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:13:41,155][05066] Avg episode reward: [(0, '3.990'), (1, '3.560')] [2023-09-22 13:13:41,165][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000002736_700416.pth... [2023-09-22 13:13:41,166][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000003296_843776.pth... [2023-09-22 13:13:41,200][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000000560_143360.pth [2023-09-22 13:13:41,203][06078] Saving new best policy, reward=3.990! [2023-09-22 13:13:46,154][05066] Fps is (10 sec: 6143.9, 60 sec: 6212.2, 300 sec: 6083.9). Total num frames: 1572864. Throughput: 0: 786.2, 1: 787.2. Samples: 357983. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:13:46,155][05066] Avg episode reward: [(0, '3.890'), (1, '3.530')] [2023-09-22 13:13:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6076.6). Total num frames: 1601536. Throughput: 0: 783.7, 1: 784.6. Samples: 362496. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:13:51,155][05066] Avg episode reward: [(0, '3.520'), (1, '3.750')] [2023-09-22 13:13:52,871][06493] Updated weights for policy 1, policy_version 2880 (0.0015) [2023-09-22 13:13:52,871][06567] Updated weights for policy 0, policy_version 3440 (0.0016) [2023-09-22 13:13:56,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6086.4). Total num frames: 1634304. Throughput: 0: 783.4, 1: 783.2. Samples: 371557. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:13:56,154][05066] Avg episode reward: [(0, '3.580'), (1, '3.850')] [2023-09-22 13:14:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6095.7). Total num frames: 1667072. Throughput: 0: 782.4, 1: 783.0. Samples: 380928. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:14:01,154][05066] Avg episode reward: [(0, '3.520'), (1, '4.300')] [2023-09-22 13:14:01,155][06278] Saving new best policy, reward=4.300! [2023-09-22 13:14:06,101][06493] Updated weights for policy 1, policy_version 3040 (0.0016) [2023-09-22 13:14:06,102][06567] Updated weights for policy 0, policy_version 3600 (0.0014) [2023-09-22 13:14:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6104.7). Total num frames: 1699840. Throughput: 0: 778.6, 1: 778.9. Samples: 385396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:14:06,154][05066] Avg episode reward: [(0, '3.540'), (1, '4.450')] [2023-09-22 13:14:06,155][06278] Saving new best policy, reward=4.450! [2023-09-22 13:14:11,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6081.8). Total num frames: 1724416. Throughput: 0: 783.7, 1: 783.2. Samples: 395222. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:14:11,155][05066] Avg episode reward: [(0, '4.010'), (1, '4.160')] [2023-09-22 13:14:11,300][06078] Saving new best policy, reward=4.010! [2023-09-22 13:14:16,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6090.7). Total num frames: 1757184. Throughput: 0: 781.4, 1: 782.4. Samples: 404379. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:14:16,155][05066] Avg episode reward: [(0, '4.030'), (1, '3.910')] [2023-09-22 13:14:16,156][06078] Saving new best policy, reward=4.030! [2023-09-22 13:14:19,101][06567] Updated weights for policy 0, policy_version 3760 (0.0019) [2023-09-22 13:14:19,101][06493] Updated weights for policy 1, policy_version 3200 (0.0020) [2023-09-22 13:14:21,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6099.3). Total num frames: 1789952. Throughput: 0: 782.5, 1: 782.1. Samples: 409293. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:14:21,154][05066] Avg episode reward: [(0, '4.020'), (1, '4.200')] [2023-09-22 13:14:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6107.6). Total num frames: 1822720. Throughput: 0: 777.8, 1: 778.6. Samples: 418267. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:14:26,154][05066] Avg episode reward: [(0, '3.460'), (1, '4.530')] [2023-09-22 13:14:26,161][06278] Saving new best policy, reward=4.530! [2023-09-22 13:14:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6115.5). Total num frames: 1855488. Throughput: 0: 776.9, 1: 775.5. Samples: 427842. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:14:31,155][05066] Avg episode reward: [(0, '3.280'), (1, '4.910')] [2023-09-22 13:14:31,156][06278] Saving new best policy, reward=4.910! [2023-09-22 13:14:32,406][06567] Updated weights for policy 0, policy_version 3920 (0.0016) [2023-09-22 13:14:32,406][06493] Updated weights for policy 1, policy_version 3360 (0.0015) [2023-09-22 13:14:36,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6094.5). Total num frames: 1880064. Throughput: 0: 773.8, 1: 773.7. Samples: 432133. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:14:36,154][05066] Avg episode reward: [(0, '3.470'), (1, '4.110')] [2023-09-22 13:14:41,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6102.4). Total num frames: 1912832. Throughput: 0: 779.2, 1: 779.1. Samples: 441680. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:14:41,155][05066] Avg episode reward: [(0, '3.610'), (1, '3.840')] [2023-09-22 13:14:45,586][06567] Updated weights for policy 0, policy_version 4080 (0.0020) [2023-09-22 13:14:45,593][06493] Updated weights for policy 1, policy_version 3520 (0.0018) [2023-09-22 13:14:46,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6212.3, 300 sec: 6110.0). Total num frames: 1945600. Throughput: 0: 776.3, 1: 775.7. Samples: 450770. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:14:46,155][05066] Avg episode reward: [(0, '3.820'), (1, '3.890')] [2023-09-22 13:14:51,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 1978368. Throughput: 0: 778.6, 1: 778.2. Samples: 455456. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:14:51,154][05066] Avg episode reward: [(0, '4.090'), (1, '4.150')] [2023-09-22 13:14:51,155][06078] Saving new best policy, reward=4.090! [2023-09-22 13:14:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2011136. Throughput: 0: 773.7, 1: 774.6. Samples: 464896. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:14:56,155][05066] Avg episode reward: [(0, '4.040'), (1, '4.570')] [2023-09-22 13:14:58,708][06493] Updated weights for policy 1, policy_version 3680 (0.0017) [2023-09-22 13:14:58,709][06567] Updated weights for policy 0, policy_version 4240 (0.0018) [2023-09-22 13:15:01,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2035712. Throughput: 0: 775.8, 1: 774.8. Samples: 474155. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:01,154][05066] Avg episode reward: [(0, '3.940'), (1, '4.580')] [2023-09-22 13:15:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2068480. Throughput: 0: 773.4, 1: 772.3. Samples: 478848. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:15:06,155][05066] Avg episode reward: [(0, '3.770'), (1, '4.620')] [2023-09-22 13:15:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 2101248. Throughput: 0: 775.8, 1: 775.0. Samples: 488055. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:11,154][05066] Avg episode reward: [(0, '3.600'), (1, '4.850')] [2023-09-22 13:15:12,021][06567] Updated weights for policy 0, policy_version 4400 (0.0013) [2023-09-22 13:15:12,021][06493] Updated weights for policy 1, policy_version 3840 (0.0016) [2023-09-22 13:15:16,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 2134016. Throughput: 0: 774.6, 1: 776.0. Samples: 497617. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:16,154][05066] Avg episode reward: [(0, '3.610'), (1, '4.830')] [2023-09-22 13:15:21,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6192.6). Total num frames: 2158592. Throughput: 0: 775.6, 1: 775.1. Samples: 501912. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:21,155][05066] Avg episode reward: [(0, '3.700'), (1, '5.600')] [2023-09-22 13:15:21,156][06278] Saving new best policy, reward=5.600! [2023-09-22 13:15:25,313][06493] Updated weights for policy 1, policy_version 4000 (0.0015) [2023-09-22 13:15:25,313][06567] Updated weights for policy 0, policy_version 4560 (0.0017) [2023-09-22 13:15:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2191360. Throughput: 0: 773.9, 1: 772.8. Samples: 511281. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:26,154][05066] Avg episode reward: [(0, '3.910'), (1, '5.150')] [2023-09-22 13:15:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2224128. Throughput: 0: 775.0, 1: 774.9. Samples: 520514. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:15:31,154][05066] Avg episode reward: [(0, '3.800'), (1, '5.060')] [2023-09-22 13:15:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2256896. Throughput: 0: 777.9, 1: 777.3. Samples: 525439. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:36,155][05066] Avg episode reward: [(0, '3.770'), (1, '4.280')] [2023-09-22 13:15:38,363][06493] Updated weights for policy 1, policy_version 4160 (0.0017) [2023-09-22 13:15:38,363][06567] Updated weights for policy 0, policy_version 4720 (0.0018) [2023-09-22 13:15:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 2289664. Throughput: 0: 773.7, 1: 773.7. Samples: 534532. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:41,154][05066] Avg episode reward: [(0, '3.920'), (1, '4.280')] [2023-09-22 13:15:41,161][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000004192_1073152.pth... [2023-09-22 13:15:41,162][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000004752_1216512.pth... [2023-09-22 13:15:41,190][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000001264_323584.pth [2023-09-22 13:15:41,197][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000001824_466944.pth [2023-09-22 13:15:46,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2314240. Throughput: 0: 777.6, 1: 777.8. Samples: 544149. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:46,155][05066] Avg episode reward: [(0, '4.200'), (1, '4.330')] [2023-09-22 13:15:46,312][06078] Saving new best policy, reward=4.200! [2023-09-22 13:15:51,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2347008. Throughput: 0: 777.2, 1: 778.7. Samples: 548864. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:15:51,155][05066] Avg episode reward: [(0, '4.320'), (1, '4.290')] [2023-09-22 13:15:51,156][06078] Saving new best policy, reward=4.320! [2023-09-22 13:15:51,534][06567] Updated weights for policy 0, policy_version 4880 (0.0015) [2023-09-22 13:15:51,534][06493] Updated weights for policy 1, policy_version 4320 (0.0017) [2023-09-22 13:15:56,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2379776. Throughput: 0: 779.1, 1: 779.3. Samples: 558184. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:15:56,154][05066] Avg episode reward: [(0, '4.370'), (1, '4.300')] [2023-09-22 13:15:56,164][06078] Saving new best policy, reward=4.370! [2023-09-22 13:16:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2412544. Throughput: 0: 777.0, 1: 776.6. Samples: 567529. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:16:01,154][05066] Avg episode reward: [(0, '4.470'), (1, '4.250')] [2023-09-22 13:16:01,155][06078] Saving new best policy, reward=4.470! [2023-09-22 13:16:04,490][06567] Updated weights for policy 0, policy_version 5040 (0.0015) [2023-09-22 13:16:04,491][06493] Updated weights for policy 1, policy_version 4480 (0.0017) [2023-09-22 13:16:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2445312. Throughput: 0: 783.2, 1: 782.9. Samples: 572388. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:16:06,154][05066] Avg episode reward: [(0, '4.470'), (1, '4.500')] [2023-09-22 13:16:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2478080. Throughput: 0: 781.0, 1: 782.4. Samples: 581632. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:16:11,155][05066] Avg episode reward: [(0, '4.760'), (1, '4.800')] [2023-09-22 13:16:11,165][06078] Saving new best policy, reward=4.760! [2023-09-22 13:16:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2502656. Throughput: 0: 786.7, 1: 786.4. Samples: 591304. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:16:16,154][05066] Avg episode reward: [(0, '4.440'), (1, '4.690')] [2023-09-22 13:16:17,544][06493] Updated weights for policy 1, policy_version 4640 (0.0018) [2023-09-22 13:16:17,544][06567] Updated weights for policy 0, policy_version 5200 (0.0014) [2023-09-22 13:16:21,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 2535424. Throughput: 0: 783.0, 1: 784.4. Samples: 595968. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:16:21,154][05066] Avg episode reward: [(0, '5.230'), (1, '4.560')] [2023-09-22 13:16:21,155][06078] Saving new best policy, reward=5.230! [2023-09-22 13:16:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2568192. Throughput: 0: 788.0, 1: 786.7. Samples: 605394. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:16:26,154][05066] Avg episode reward: [(0, '4.990'), (1, '4.440')] [2023-09-22 13:16:30,933][06493] Updated weights for policy 1, policy_version 4800 (0.0014) [2023-09-22 13:16:30,933][06567] Updated weights for policy 0, policy_version 5360 (0.0015) [2023-09-22 13:16:31,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2600960. Throughput: 0: 780.2, 1: 780.9. Samples: 614400. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:16:31,155][05066] Avg episode reward: [(0, '4.770'), (1, '4.750')] [2023-09-22 13:16:36,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2625536. Throughput: 0: 775.5, 1: 774.9. Samples: 618634. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:16:36,155][05066] Avg episode reward: [(0, '4.430'), (1, '4.730')] [2023-09-22 13:16:41,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2658304. Throughput: 0: 778.4, 1: 778.1. Samples: 628227. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:16:41,155][05066] Avg episode reward: [(0, '4.240'), (1, '4.390')] [2023-09-22 13:16:44,082][06567] Updated weights for policy 0, policy_version 5520 (0.0018) [2023-09-22 13:16:44,082][06493] Updated weights for policy 1, policy_version 4960 (0.0016) [2023-09-22 13:16:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2691072. Throughput: 0: 779.3, 1: 778.5. Samples: 637629. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:16:46,155][05066] Avg episode reward: [(0, '4.770'), (1, '4.350')] [2023-09-22 13:16:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2723840. Throughput: 0: 779.1, 1: 780.4. Samples: 642565. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:16:51,155][05066] Avg episode reward: [(0, '5.280'), (1, '4.290')] [2023-09-22 13:16:51,156][06078] Saving new best policy, reward=5.280! [2023-09-22 13:16:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2756608. Throughput: 0: 779.9, 1: 779.4. Samples: 651798. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:16:56,155][05066] Avg episode reward: [(0, '5.350'), (1, '4.530')] [2023-09-22 13:16:56,165][06078] Saving new best policy, reward=5.350! [2023-09-22 13:16:57,014][06567] Updated weights for policy 0, policy_version 5680 (0.0019) [2023-09-22 13:16:57,015][06493] Updated weights for policy 1, policy_version 5120 (0.0019) [2023-09-22 13:17:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2789376. Throughput: 0: 779.4, 1: 780.3. Samples: 661491. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:17:01,155][05066] Avg episode reward: [(0, '5.180'), (1, '4.330')] [2023-09-22 13:17:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2822144. Throughput: 0: 777.7, 1: 776.9. Samples: 665928. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:17:06,155][05066] Avg episode reward: [(0, '5.170'), (1, '4.260')] [2023-09-22 13:17:10,090][06567] Updated weights for policy 0, policy_version 5840 (0.0016) [2023-09-22 13:17:10,091][06493] Updated weights for policy 1, policy_version 5280 (0.0017) [2023-09-22 13:17:11,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2846720. Throughput: 0: 779.6, 1: 779.4. Samples: 675546. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:17:11,154][05066] Avg episode reward: [(0, '5.160'), (1, '4.280')] [2023-09-22 13:17:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 2879488. Throughput: 0: 779.0, 1: 778.4. Samples: 684486. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:17:16,155][05066] Avg episode reward: [(0, '5.610'), (1, '4.410')] [2023-09-22 13:17:16,156][06078] Saving new best policy, reward=5.610! [2023-09-22 13:17:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6234.3). Total num frames: 2912256. Throughput: 0: 784.4, 1: 784.5. Samples: 689235. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:17:21,155][05066] Avg episode reward: [(0, '5.570'), (1, '4.570')] [2023-09-22 13:17:23,396][06493] Updated weights for policy 1, policy_version 5440 (0.0017) [2023-09-22 13:17:23,396][06567] Updated weights for policy 0, policy_version 6000 (0.0015) [2023-09-22 13:17:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 2945024. Throughput: 0: 779.8, 1: 780.0. Samples: 698420. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:17:26,154][05066] Avg episode reward: [(0, '5.770'), (1, '4.850')] [2023-09-22 13:17:26,163][06078] Saving new best policy, reward=5.770! [2023-09-22 13:17:31,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 2969600. Throughput: 0: 780.6, 1: 781.5. Samples: 707922. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:17:31,154][05066] Avg episode reward: [(0, '5.200'), (1, '5.000')] [2023-09-22 13:17:36,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3002368. Throughput: 0: 778.9, 1: 778.4. Samples: 712646. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:17:36,155][05066] Avg episode reward: [(0, '4.910'), (1, '5.080')] [2023-09-22 13:17:36,698][06493] Updated weights for policy 1, policy_version 5600 (0.0018) [2023-09-22 13:17:36,698][06567] Updated weights for policy 0, policy_version 6160 (0.0017) [2023-09-22 13:17:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 3035136. Throughput: 0: 774.6, 1: 774.8. Samples: 721525. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:17:41,154][05066] Avg episode reward: [(0, '4.960'), (1, '6.100')] [2023-09-22 13:17:41,163][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000005648_1445888.pth... [2023-09-22 13:17:41,163][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000006208_1589248.pth... [2023-09-22 13:17:41,198][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000002736_700416.pth [2023-09-22 13:17:41,201][06278] Saving new best policy, reward=6.100! [2023-09-22 13:17:41,210][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000003296_843776.pth [2023-09-22 13:17:46,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3067904. Throughput: 0: 773.7, 1: 773.9. Samples: 731136. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:17:46,155][05066] Avg episode reward: [(0, '5.540'), (1, '6.390')] [2023-09-22 13:17:46,156][06278] Saving new best policy, reward=6.390! [2023-09-22 13:17:49,871][06493] Updated weights for policy 1, policy_version 5760 (0.0016) [2023-09-22 13:17:49,871][06567] Updated weights for policy 0, policy_version 6320 (0.0015) [2023-09-22 13:17:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3092480. Throughput: 0: 774.4, 1: 774.2. Samples: 735615. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:17:51,154][05066] Avg episode reward: [(0, '5.260'), (1, '6.040')] [2023-09-22 13:17:56,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3125248. Throughput: 0: 774.3, 1: 775.6. Samples: 745289. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:17:56,155][05066] Avg episode reward: [(0, '5.670'), (1, '4.750')] [2023-09-22 13:18:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3158016. Throughput: 0: 780.2, 1: 779.9. Samples: 754691. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:01,155][05066] Avg episode reward: [(0, '5.980'), (1, '4.800')] [2023-09-22 13:18:01,156][06078] Saving new best policy, reward=5.980! [2023-09-22 13:18:02,840][06567] Updated weights for policy 0, policy_version 6480 (0.0018) [2023-09-22 13:18:02,840][06493] Updated weights for policy 1, policy_version 5920 (0.0015) [2023-09-22 13:18:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3190784. Throughput: 0: 780.2, 1: 780.7. Samples: 759479. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:06,155][05066] Avg episode reward: [(0, '5.790'), (1, '5.380')] [2023-09-22 13:18:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3223552. Throughput: 0: 780.6, 1: 780.3. Samples: 768664. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:11,155][05066] Avg episode reward: [(0, '5.660'), (1, '5.890')] [2023-09-22 13:18:15,766][06493] Updated weights for policy 1, policy_version 6080 (0.0016) [2023-09-22 13:18:15,766][06567] Updated weights for policy 0, policy_version 6640 (0.0013) [2023-09-22 13:18:16,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 3256320. Throughput: 0: 781.2, 1: 781.5. Samples: 778241. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:16,154][05066] Avg episode reward: [(0, '4.930'), (1, '6.400')] [2023-09-22 13:18:16,155][06278] Saving new best policy, reward=6.400! [2023-09-22 13:18:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3289088. Throughput: 0: 781.0, 1: 780.4. Samples: 782906. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:21,155][05066] Avg episode reward: [(0, '5.390'), (1, '6.110')] [2023-09-22 13:18:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3313664. Throughput: 0: 786.8, 1: 786.5. Samples: 792323. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:26,154][05066] Avg episode reward: [(0, '5.550'), (1, '6.500')] [2023-09-22 13:18:26,161][06278] Saving new best policy, reward=6.500! [2023-09-22 13:18:28,962][06493] Updated weights for policy 1, policy_version 6240 (0.0016) [2023-09-22 13:18:28,963][06567] Updated weights for policy 0, policy_version 6800 (0.0017) [2023-09-22 13:18:31,153][05066] Fps is (10 sec: 5734.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3346432. Throughput: 0: 784.3, 1: 783.6. Samples: 801690. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:31,154][05066] Avg episode reward: [(0, '5.280'), (1, '6.340')] [2023-09-22 13:18:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3379200. Throughput: 0: 785.1, 1: 786.9. Samples: 806355. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:18:36,155][05066] Avg episode reward: [(0, '4.960'), (1, '6.240')] [2023-09-22 13:18:41,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6234.2). Total num frames: 3411968. Throughput: 0: 777.7, 1: 777.3. Samples: 815265. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:18:41,155][05066] Avg episode reward: [(0, '4.820'), (1, '6.190')] [2023-09-22 13:18:42,171][06493] Updated weights for policy 1, policy_version 6400 (0.0015) [2023-09-22 13:18:42,172][06567] Updated weights for policy 0, policy_version 6960 (0.0013) [2023-09-22 13:18:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3444736. Throughput: 0: 782.0, 1: 783.4. Samples: 825133. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:46,155][05066] Avg episode reward: [(0, '5.120'), (1, '5.590')] [2023-09-22 13:18:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3469312. Throughput: 0: 778.1, 1: 777.6. Samples: 829484. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:51,154][05066] Avg episode reward: [(0, '4.900'), (1, '6.180')] [2023-09-22 13:18:55,202][06493] Updated weights for policy 1, policy_version 6560 (0.0017) [2023-09-22 13:18:55,202][06567] Updated weights for policy 0, policy_version 7120 (0.0019) [2023-09-22 13:18:56,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 3502080. Throughput: 0: 782.8, 1: 783.3. Samples: 839137. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:18:56,154][05066] Avg episode reward: [(0, '4.890'), (1, '6.560')] [2023-09-22 13:18:56,164][06278] Saving new best policy, reward=6.560! [2023-09-22 13:19:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3534848. Throughput: 0: 776.4, 1: 775.7. Samples: 848087. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:19:01,154][05066] Avg episode reward: [(0, '5.000'), (1, '7.120')] [2023-09-22 13:19:01,155][06278] Saving new best policy, reward=7.120! [2023-09-22 13:19:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3567616. Throughput: 0: 778.0, 1: 777.8. Samples: 852918. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:19:06,155][05066] Avg episode reward: [(0, '5.490'), (1, '7.710')] [2023-09-22 13:19:06,156][06278] Saving new best policy, reward=7.710! [2023-09-22 13:19:08,465][06493] Updated weights for policy 1, policy_version 6720 (0.0016) [2023-09-22 13:19:08,465][06567] Updated weights for policy 0, policy_version 7280 (0.0018) [2023-09-22 13:19:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3600384. Throughput: 0: 776.2, 1: 776.8. Samples: 862209. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:19:11,155][05066] Avg episode reward: [(0, '5.640'), (1, '7.030')] [2023-09-22 13:19:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3624960. Throughput: 0: 779.5, 1: 779.4. Samples: 871842. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:19:16,155][05066] Avg episode reward: [(0, '5.410'), (1, '7.080')] [2023-09-22 13:19:21,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3657728. Throughput: 0: 780.3, 1: 779.5. Samples: 876544. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:19:21,154][05066] Avg episode reward: [(0, '5.370'), (1, '6.760')] [2023-09-22 13:19:21,524][06567] Updated weights for policy 0, policy_version 7440 (0.0016) [2023-09-22 13:19:21,524][06493] Updated weights for policy 1, policy_version 6880 (0.0017) [2023-09-22 13:19:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3690496. Throughput: 0: 782.2, 1: 782.1. Samples: 885658. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:19:26,154][05066] Avg episode reward: [(0, '5.500'), (1, '6.750')] [2023-09-22 13:19:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3723264. Throughput: 0: 776.2, 1: 775.8. Samples: 894976. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:19:31,155][05066] Avg episode reward: [(0, '5.690'), (1, '7.200')] [2023-09-22 13:19:34,803][06567] Updated weights for policy 0, policy_version 7600 (0.0019) [2023-09-22 13:19:34,803][06493] Updated weights for policy 1, policy_version 7040 (0.0018) [2023-09-22 13:19:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3756032. Throughput: 0: 777.0, 1: 776.8. Samples: 899406. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:19:36,155][05066] Avg episode reward: [(0, '5.400'), (1, '7.060')] [2023-09-22 13:19:41,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3780608. Throughput: 0: 778.4, 1: 778.3. Samples: 909190. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:19:41,154][05066] Avg episode reward: [(0, '5.560'), (1, '7.500')] [2023-09-22 13:19:41,237][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000007680_1966080.pth... [2023-09-22 13:19:41,238][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000007120_1822720.pth... [2023-09-22 13:19:41,267][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000004752_1216512.pth [2023-09-22 13:19:41,267][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000004192_1073152.pth [2023-09-22 13:19:46,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3813376. Throughput: 0: 781.6, 1: 781.5. Samples: 918427. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:19:46,154][05066] Avg episode reward: [(0, '5.330'), (1, '7.500')] [2023-09-22 13:19:47,792][06567] Updated weights for policy 0, policy_version 7760 (0.0017) [2023-09-22 13:19:47,793][06493] Updated weights for policy 1, policy_version 7200 (0.0017) [2023-09-22 13:19:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 3846144. Throughput: 0: 783.2, 1: 783.1. Samples: 923401. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:19:51,154][05066] Avg episode reward: [(0, '5.720'), (1, '7.200')] [2023-09-22 13:19:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 3878912. Throughput: 0: 778.0, 1: 777.0. Samples: 932183. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:19:56,154][05066] Avg episode reward: [(0, '5.820'), (1, '7.260')] [2023-09-22 13:20:01,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3903488. Throughput: 0: 775.1, 1: 775.2. Samples: 941607. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:20:01,155][05066] Avg episode reward: [(0, '6.140'), (1, '7.050')] [2023-09-22 13:20:01,238][06078] Saving new best policy, reward=6.140! [2023-09-22 13:20:01,240][06567] Updated weights for policy 0, policy_version 7920 (0.0015) [2023-09-22 13:20:01,240][06493] Updated weights for policy 1, policy_version 7360 (0.0018) [2023-09-22 13:20:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3936256. Throughput: 0: 773.7, 1: 773.7. Samples: 946176. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:06,154][05066] Avg episode reward: [(0, '6.280'), (1, '6.610')] [2023-09-22 13:20:06,155][06078] Saving new best policy, reward=6.280! [2023-09-22 13:20:11,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 3969024. Throughput: 0: 778.9, 1: 779.3. Samples: 955777. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:11,154][05066] Avg episode reward: [(0, '5.770'), (1, '6.750')] [2023-09-22 13:20:14,158][06567] Updated weights for policy 0, policy_version 8080 (0.0017) [2023-09-22 13:20:14,158][06493] Updated weights for policy 1, policy_version 7520 (0.0018) [2023-09-22 13:20:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 4001792. Throughput: 0: 779.7, 1: 779.2. Samples: 965126. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:16,154][05066] Avg episode reward: [(0, '5.780'), (1, '7.260')] [2023-09-22 13:20:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4034560. Throughput: 0: 785.6, 1: 785.5. Samples: 970105. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:20:21,155][05066] Avg episode reward: [(0, '5.820'), (1, '7.520')] [2023-09-22 13:20:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4067328. Throughput: 0: 778.4, 1: 778.3. Samples: 979241. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:20:26,154][05066] Avg episode reward: [(0, '6.350'), (1, '7.760')] [2023-09-22 13:20:26,161][06278] Saving new best policy, reward=7.760! [2023-09-22 13:20:26,161][06078] Saving new best policy, reward=6.350! [2023-09-22 13:20:27,162][06567] Updated weights for policy 0, policy_version 8240 (0.0018) [2023-09-22 13:20:27,163][06493] Updated weights for policy 1, policy_version 7680 (0.0018) [2023-09-22 13:20:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4100096. Throughput: 0: 783.3, 1: 783.7. Samples: 988942. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:31,155][05066] Avg episode reward: [(0, '6.550'), (1, '7.550')] [2023-09-22 13:20:31,156][06078] Saving new best policy, reward=6.550! [2023-09-22 13:20:36,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4124672. Throughput: 0: 776.0, 1: 777.0. Samples: 993285. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:36,154][05066] Avg episode reward: [(0, '6.560'), (1, '7.490')] [2023-09-22 13:20:36,155][06078] Saving new best policy, reward=6.560! [2023-09-22 13:20:40,377][06567] Updated weights for policy 0, policy_version 8400 (0.0018) [2023-09-22 13:20:40,377][06493] Updated weights for policy 1, policy_version 7840 (0.0018) [2023-09-22 13:20:41,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4157440. Throughput: 0: 782.7, 1: 783.7. Samples: 1002674. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:41,155][05066] Avg episode reward: [(0, '6.630'), (1, '7.690')] [2023-09-22 13:20:41,164][06078] Saving new best policy, reward=6.630! [2023-09-22 13:20:46,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4190208. Throughput: 0: 782.6, 1: 782.9. Samples: 1012053. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:46,155][05066] Avg episode reward: [(0, '6.600'), (1, '7.640')] [2023-09-22 13:20:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4222976. Throughput: 0: 783.9, 1: 783.9. Samples: 1016726. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:20:51,155][05066] Avg episode reward: [(0, '6.820'), (1, '7.730')] [2023-09-22 13:20:51,156][06078] Saving new best policy, reward=6.820! [2023-09-22 13:20:53,569][06493] Updated weights for policy 1, policy_version 8000 (0.0016) [2023-09-22 13:20:53,569][06567] Updated weights for policy 0, policy_version 8560 (0.0016) [2023-09-22 13:20:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4247552. Throughput: 0: 780.6, 1: 781.0. Samples: 1026048. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:20:56,155][05066] Avg episode reward: [(0, '6.080'), (1, '8.020')] [2023-09-22 13:20:56,171][06278] Saving new best policy, reward=8.020! [2023-09-22 13:21:01,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4280320. Throughput: 0: 781.8, 1: 781.5. Samples: 1035476. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:21:01,154][05066] Avg episode reward: [(0, '5.620'), (1, '8.390')] [2023-09-22 13:21:01,155][06278] Saving new best policy, reward=8.390! [2023-09-22 13:21:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4313088. Throughput: 0: 778.5, 1: 778.1. Samples: 1040152. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:21:06,154][05066] Avg episode reward: [(0, '5.270'), (1, '8.230')] [2023-09-22 13:21:06,658][06493] Updated weights for policy 1, policy_version 8160 (0.0015) [2023-09-22 13:21:06,658][06567] Updated weights for policy 0, policy_version 8720 (0.0014) [2023-09-22 13:21:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4345856. Throughput: 0: 780.5, 1: 780.7. Samples: 1049494. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:21:11,154][05066] Avg episode reward: [(0, '5.470'), (1, '8.450')] [2023-09-22 13:21:11,162][06278] Saving new best policy, reward=8.450! [2023-09-22 13:21:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4378624. Throughput: 0: 776.1, 1: 776.6. Samples: 1058816. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:21:16,154][05066] Avg episode reward: [(0, '5.400'), (1, '7.550')] [2023-09-22 13:21:19,730][06567] Updated weights for policy 0, policy_version 8880 (0.0016) [2023-09-22 13:21:19,731][06493] Updated weights for policy 1, policy_version 8320 (0.0013) [2023-09-22 13:21:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4411392. Throughput: 0: 779.8, 1: 779.1. Samples: 1063436. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:21:21,155][05066] Avg episode reward: [(0, '5.180'), (1, '6.710')] [2023-09-22 13:21:26,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4435968. Throughput: 0: 783.1, 1: 783.1. Samples: 1073152. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:21:26,155][05066] Avg episode reward: [(0, '5.530'), (1, '6.690')] [2023-09-22 13:21:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 4468736. Throughput: 0: 784.2, 1: 784.0. Samples: 1082623. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:21:31,155][05066] Avg episode reward: [(0, '5.490'), (1, '7.250')] [2023-09-22 13:21:32,644][06567] Updated weights for policy 0, policy_version 9040 (0.0016) [2023-09-22 13:21:32,645][06493] Updated weights for policy 1, policy_version 8480 (0.0018) [2023-09-22 13:21:36,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4501504. Throughput: 0: 786.3, 1: 786.2. Samples: 1087488. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:21:36,154][05066] Avg episode reward: [(0, '5.680'), (1, '7.390')] [2023-09-22 13:21:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4534272. Throughput: 0: 787.2, 1: 787.1. Samples: 1096892. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:21:41,154][05066] Avg episode reward: [(0, '6.150'), (1, '8.080')] [2023-09-22 13:21:41,162][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000008576_2195456.pth... [2023-09-22 13:21:41,162][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000009136_2338816.pth... [2023-09-22 13:21:41,197][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000005648_1445888.pth [2023-09-22 13:21:41,198][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000006208_1589248.pth [2023-09-22 13:21:45,757][06567] Updated weights for policy 0, policy_version 9200 (0.0016) [2023-09-22 13:21:45,757][06493] Updated weights for policy 1, policy_version 8640 (0.0016) [2023-09-22 13:21:46,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4567040. Throughput: 0: 782.4, 1: 783.1. Samples: 1105925. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:21:46,155][05066] Avg episode reward: [(0, '6.280'), (1, '8.140')] [2023-09-22 13:21:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 4591616. Throughput: 0: 781.3, 1: 782.0. Samples: 1110500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:21:51,155][05066] Avg episode reward: [(0, '6.520'), (1, '7.700')] [2023-09-22 13:21:56,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6220.4). Total num frames: 4624384. Throughput: 0: 783.4, 1: 783.2. Samples: 1119987. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:21:56,154][05066] Avg episode reward: [(0, '6.820'), (1, '8.480')] [2023-09-22 13:21:56,162][06278] Saving new best policy, reward=8.480! [2023-09-22 13:21:58,997][06493] Updated weights for policy 1, policy_version 8800 (0.0015) [2023-09-22 13:21:58,997][06567] Updated weights for policy 0, policy_version 9360 (0.0016) [2023-09-22 13:22:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6220.4). Total num frames: 4657152. Throughput: 0: 783.1, 1: 782.5. Samples: 1129268. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:01,154][05066] Avg episode reward: [(0, '6.650'), (1, '8.860')] [2023-09-22 13:22:01,155][06278] Saving new best policy, reward=8.860! [2023-09-22 13:22:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4689920. Throughput: 0: 786.2, 1: 786.2. Samples: 1134192. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:06,154][05066] Avg episode reward: [(0, '6.830'), (1, '8.860')] [2023-09-22 13:22:06,155][06078] Saving new best policy, reward=6.830! [2023-09-22 13:22:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4722688. Throughput: 0: 783.6, 1: 783.0. Samples: 1143646. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:11,155][05066] Avg episode reward: [(0, '7.300'), (1, '9.010')] [2023-09-22 13:22:11,164][06078] Saving new best policy, reward=7.300! [2023-09-22 13:22:11,164][06278] Saving new best policy, reward=9.010! [2023-09-22 13:22:12,019][06493] Updated weights for policy 1, policy_version 8960 (0.0017) [2023-09-22 13:22:12,019][06567] Updated weights for policy 0, policy_version 9520 (0.0016) [2023-09-22 13:22:16,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4755456. Throughput: 0: 781.9, 1: 782.6. Samples: 1153024. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:16,155][05066] Avg episode reward: [(0, '7.340'), (1, '8.340')] [2023-09-22 13:22:16,156][06078] Saving new best policy, reward=7.340! [2023-09-22 13:22:21,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 4788224. Throughput: 0: 779.4, 1: 778.8. Samples: 1157608. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:21,154][05066] Avg episode reward: [(0, '7.090'), (1, '8.020')] [2023-09-22 13:22:24,970][06567] Updated weights for policy 0, policy_version 9680 (0.0017) [2023-09-22 13:22:24,970][06493] Updated weights for policy 1, policy_version 9120 (0.0016) [2023-09-22 13:22:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4812800. Throughput: 0: 782.6, 1: 781.6. Samples: 1167279. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:26,155][05066] Avg episode reward: [(0, '6.830'), (1, '7.900')] [2023-09-22 13:22:31,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4845568. Throughput: 0: 784.6, 1: 784.1. Samples: 1176514. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:31,155][05066] Avg episode reward: [(0, '6.600'), (1, '8.160')] [2023-09-22 13:22:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4878336. Throughput: 0: 786.9, 1: 786.1. Samples: 1181286. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:36,154][05066] Avg episode reward: [(0, '6.440'), (1, '8.350')] [2023-09-22 13:22:38,241][06493] Updated weights for policy 1, policy_version 9280 (0.0014) [2023-09-22 13:22:38,241][06567] Updated weights for policy 0, policy_version 9840 (0.0017) [2023-09-22 13:22:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 4911104. Throughput: 0: 779.1, 1: 778.7. Samples: 1190088. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:41,155][05066] Avg episode reward: [(0, '6.640'), (1, '8.400')] [2023-09-22 13:22:46,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 4935680. Throughput: 0: 782.2, 1: 782.2. Samples: 1199670. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:22:46,155][05066] Avg episode reward: [(0, '6.270'), (1, '8.680')] [2023-09-22 13:22:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 4968448. Throughput: 0: 777.7, 1: 778.6. Samples: 1204224. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:51,154][05066] Avg episode reward: [(0, '6.440'), (1, '9.720')] [2023-09-22 13:22:51,155][06278] Saving new best policy, reward=9.720! [2023-09-22 13:22:51,437][06567] Updated weights for policy 0, policy_version 10000 (0.0017) [2023-09-22 13:22:51,437][06493] Updated weights for policy 1, policy_version 9440 (0.0015) [2023-09-22 13:22:56,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5001216. Throughput: 0: 776.7, 1: 776.1. Samples: 1213524. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:22:56,154][05066] Avg episode reward: [(0, '7.050'), (1, '9.540')] [2023-09-22 13:23:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5033984. Throughput: 0: 773.7, 1: 773.7. Samples: 1222656. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:23:01,154][05066] Avg episode reward: [(0, '7.300'), (1, '8.980')] [2023-09-22 13:23:04,800][06493] Updated weights for policy 1, policy_version 9600 (0.0019) [2023-09-22 13:23:04,800][06567] Updated weights for policy 0, policy_version 10160 (0.0019) [2023-09-22 13:23:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5066752. Throughput: 0: 771.8, 1: 771.9. Samples: 1227075. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:23:06,155][05066] Avg episode reward: [(0, '7.690'), (1, '8.450')] [2023-09-22 13:23:06,156][06078] Saving new best policy, reward=7.690! [2023-09-22 13:23:11,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 5091328. Throughput: 0: 774.0, 1: 774.6. Samples: 1236969. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:23:11,154][05066] Avg episode reward: [(0, '7.490'), (1, '7.740')] [2023-09-22 13:23:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 5124096. Throughput: 0: 773.5, 1: 772.7. Samples: 1246094. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:23:16,155][05066] Avg episode reward: [(0, '7.560'), (1, '7.810')] [2023-09-22 13:23:17,836][06493] Updated weights for policy 1, policy_version 9760 (0.0017) [2023-09-22 13:23:17,836][06567] Updated weights for policy 0, policy_version 10320 (0.0020) [2023-09-22 13:23:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 5156864. Throughput: 0: 774.0, 1: 774.9. Samples: 1250984. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:23:21,155][05066] Avg episode reward: [(0, '7.660'), (1, '8.220')] [2023-09-22 13:23:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 5189632. Throughput: 0: 782.4, 1: 782.4. Samples: 1260504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:23:26,154][05066] Avg episode reward: [(0, '7.640'), (1, '7.770')] [2023-09-22 13:23:30,602][06493] Updated weights for policy 1, policy_version 9920 (0.0015) [2023-09-22 13:23:30,602][06567] Updated weights for policy 0, policy_version 10480 (0.0018) [2023-09-22 13:23:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 5222400. Throughput: 0: 781.2, 1: 781.0. Samples: 1269967. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:23:31,154][05066] Avg episode reward: [(0, '7.870'), (1, '7.940')] [2023-09-22 13:23:31,155][06078] Saving new best policy, reward=7.870! [2023-09-22 13:23:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5255168. Throughput: 0: 786.5, 1: 784.9. Samples: 1274939. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:23:36,155][05066] Avg episode reward: [(0, '8.040'), (1, '7.950')] [2023-09-22 13:23:36,156][06078] Saving new best policy, reward=8.040! [2023-09-22 13:23:41,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5287936. Throughput: 0: 784.5, 1: 784.9. Samples: 1284150. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 13:23:41,155][05066] Avg episode reward: [(0, '8.090'), (1, '8.300')] [2023-09-22 13:23:41,166][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000010048_2572288.pth... [2023-09-22 13:23:41,167][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000010608_2715648.pth... [2023-09-22 13:23:41,195][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000007120_1822720.pth [2023-09-22 13:23:41,204][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000007680_1966080.pth [2023-09-22 13:23:41,207][06078] Saving new best policy, reward=8.090! [2023-09-22 13:23:43,580][06493] Updated weights for policy 1, policy_version 10080 (0.0015) [2023-09-22 13:23:43,581][06567] Updated weights for policy 0, policy_version 10640 (0.0015) [2023-09-22 13:23:46,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5312512. Throughput: 0: 790.0, 1: 790.9. Samples: 1293797. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:23:46,155][05066] Avg episode reward: [(0, '8.160'), (1, '8.540')] [2023-09-22 13:23:46,285][06078] Saving new best policy, reward=8.160! [2023-09-22 13:23:51,154][05066] Fps is (10 sec: 5734.1, 60 sec: 6280.4, 300 sec: 6248.1). Total num frames: 5345280. Throughput: 0: 792.6, 1: 793.1. Samples: 1298432. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:23:51,155][05066] Avg episode reward: [(0, '8.150'), (1, '8.580')] [2023-09-22 13:23:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5378048. Throughput: 0: 785.4, 1: 784.8. Samples: 1307626. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:23:56,155][05066] Avg episode reward: [(0, '8.110'), (1, '8.500')] [2023-09-22 13:23:56,935][06567] Updated weights for policy 0, policy_version 10800 (0.0016) [2023-09-22 13:23:56,935][06493] Updated weights for policy 1, policy_version 10240 (0.0019) [2023-09-22 13:24:01,154][05066] Fps is (10 sec: 6554.0, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5410816. Throughput: 0: 785.7, 1: 787.0. Samples: 1316865. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:01,155][05066] Avg episode reward: [(0, '8.330'), (1, '8.180')] [2023-09-22 13:24:01,156][06078] Saving new best policy, reward=8.330! [2023-09-22 13:24:06,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 5443584. Throughput: 0: 783.3, 1: 783.6. Samples: 1321492. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:06,154][05066] Avg episode reward: [(0, '9.030'), (1, '8.710')] [2023-09-22 13:24:06,155][06078] Saving new best policy, reward=9.030! [2023-09-22 13:24:10,041][06567] Updated weights for policy 0, policy_version 10960 (0.0013) [2023-09-22 13:24:10,042][06493] Updated weights for policy 1, policy_version 10400 (0.0016) [2023-09-22 13:24:11,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5468160. Throughput: 0: 783.0, 1: 783.5. Samples: 1330994. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:11,154][05066] Avg episode reward: [(0, '8.200'), (1, '8.320')] [2023-09-22 13:24:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 5500928. Throughput: 0: 781.6, 1: 781.4. Samples: 1340304. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:16,154][05066] Avg episode reward: [(0, '8.440'), (1, '8.730')] [2023-09-22 13:24:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5533696. Throughput: 0: 780.3, 1: 781.5. Samples: 1345222. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:24:21,155][05066] Avg episode reward: [(0, '8.050'), (1, '8.410')] [2023-09-22 13:24:22,985][06493] Updated weights for policy 1, policy_version 10560 (0.0016) [2023-09-22 13:24:22,985][06567] Updated weights for policy 0, policy_version 11120 (0.0015) [2023-09-22 13:24:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5566464. Throughput: 0: 783.0, 1: 783.0. Samples: 1354623. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:24:26,155][05066] Avg episode reward: [(0, '8.180'), (1, '8.500')] [2023-09-22 13:24:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5599232. Throughput: 0: 780.3, 1: 779.3. Samples: 1363980. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:31,155][05066] Avg episode reward: [(0, '7.880'), (1, '8.790')] [2023-09-22 13:24:35,870][06493] Updated weights for policy 1, policy_version 10720 (0.0017) [2023-09-22 13:24:35,871][06567] Updated weights for policy 0, policy_version 11280 (0.0019) [2023-09-22 13:24:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 5632000. Throughput: 0: 782.1, 1: 781.2. Samples: 1368782. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:36,154][05066] Avg episode reward: [(0, '8.050'), (1, '8.210')] [2023-09-22 13:24:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 5664768. Throughput: 0: 784.9, 1: 785.9. Samples: 1378310. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:41,155][05066] Avg episode reward: [(0, '8.080'), (1, '8.970')] [2023-09-22 13:24:46,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5689344. Throughput: 0: 790.7, 1: 790.1. Samples: 1388002. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:46,155][05066] Avg episode reward: [(0, '8.380'), (1, '8.370')] [2023-09-22 13:24:48,803][06493] Updated weights for policy 1, policy_version 10880 (0.0016) [2023-09-22 13:24:48,804][06567] Updated weights for policy 0, policy_version 11440 (0.0016) [2023-09-22 13:24:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 5722112. Throughput: 0: 790.4, 1: 790.6. Samples: 1392640. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:51,154][05066] Avg episode reward: [(0, '8.260'), (1, '8.190')] [2023-09-22 13:24:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 5754880. Throughput: 0: 790.2, 1: 789.4. Samples: 1402073. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:24:56,155][05066] Avg episode reward: [(0, '8.330'), (1, '8.160')] [2023-09-22 13:25:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 5787648. Throughput: 0: 787.2, 1: 787.4. Samples: 1411159. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:01,154][05066] Avg episode reward: [(0, '9.030'), (1, '8.400')] [2023-09-22 13:25:01,997][06567] Updated weights for policy 0, policy_version 11600 (0.0016) [2023-09-22 13:25:01,997][06493] Updated weights for policy 1, policy_version 11040 (0.0018) [2023-09-22 13:25:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 5820416. Throughput: 0: 786.9, 1: 786.4. Samples: 1416019. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:25:06,154][05066] Avg episode reward: [(0, '9.720'), (1, '8.430')] [2023-09-22 13:25:06,155][06078] Saving new best policy, reward=9.720! [2023-09-22 13:25:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 5853184. Throughput: 0: 786.1, 1: 786.9. Samples: 1425408. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:25:11,154][05066] Avg episode reward: [(0, '9.440'), (1, '8.100')] [2023-09-22 13:25:15,046][06567] Updated weights for policy 0, policy_version 11760 (0.0017) [2023-09-22 13:25:15,046][06493] Updated weights for policy 1, policy_version 11200 (0.0016) [2023-09-22 13:25:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5877760. Throughput: 0: 788.0, 1: 787.3. Samples: 1434867. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:25:16,154][05066] Avg episode reward: [(0, '9.840'), (1, '8.290')] [2023-09-22 13:25:16,356][06078] Saving new best policy, reward=9.840! [2023-09-22 13:25:21,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5910528. Throughput: 0: 788.0, 1: 788.9. Samples: 1439743. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:25:21,155][05066] Avg episode reward: [(0, '10.200'), (1, '8.120')] [2023-09-22 13:25:21,156][06078] Saving new best policy, reward=10.200! [2023-09-22 13:25:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 5943296. Throughput: 0: 785.5, 1: 785.4. Samples: 1449000. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:26,155][05066] Avg episode reward: [(0, '9.010'), (1, '8.130')] [2023-09-22 13:25:27,988][06567] Updated weights for policy 0, policy_version 11920 (0.0021) [2023-09-22 13:25:27,988][06493] Updated weights for policy 1, policy_version 11360 (0.0020) [2023-09-22 13:25:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 5976064. Throughput: 0: 780.9, 1: 780.8. Samples: 1458277. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:31,155][05066] Avg episode reward: [(0, '8.880'), (1, '8.590')] [2023-09-22 13:25:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6008832. Throughput: 0: 784.9, 1: 784.4. Samples: 1463258. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:36,154][05066] Avg episode reward: [(0, '8.610'), (1, '8.940')] [2023-09-22 13:25:40,972][06567] Updated weights for policy 0, policy_version 12080 (0.0017) [2023-09-22 13:25:40,973][06493] Updated weights for policy 1, policy_version 11520 (0.0016) [2023-09-22 13:25:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6041600. Throughput: 0: 782.0, 1: 783.5. Samples: 1472521. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:41,155][05066] Avg episode reward: [(0, '9.270'), (1, '9.470')] [2023-09-22 13:25:41,166][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000012080_3092480.pth... [2023-09-22 13:25:41,166][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000011520_2949120.pth... [2023-09-22 13:25:41,197][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000009136_2338816.pth [2023-09-22 13:25:41,201][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000008576_2195456.pth [2023-09-22 13:25:46,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6066176. Throughput: 0: 788.9, 1: 789.2. Samples: 1482172. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:46,154][05066] Avg episode reward: [(0, '9.470'), (1, '9.680')] [2023-09-22 13:25:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6098944. Throughput: 0: 786.4, 1: 787.0. Samples: 1486822. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:51,154][05066] Avg episode reward: [(0, '9.650'), (1, '9.540')] [2023-09-22 13:25:54,239][06567] Updated weights for policy 0, policy_version 12240 (0.0016) [2023-09-22 13:25:54,239][06493] Updated weights for policy 1, policy_version 11680 (0.0015) [2023-09-22 13:25:56,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6131712. Throughput: 0: 784.5, 1: 783.7. Samples: 1495979. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:25:56,154][05066] Avg episode reward: [(0, '9.120'), (1, '9.480')] [2023-09-22 13:26:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6164480. Throughput: 0: 783.4, 1: 783.4. Samples: 1505372. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:26:01,154][05066] Avg episode reward: [(0, '9.730'), (1, '9.380')] [2023-09-22 13:26:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6197248. Throughput: 0: 782.3, 1: 781.8. Samples: 1510129. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:26:06,155][05066] Avg episode reward: [(0, '9.440'), (1, '8.850')] [2023-09-22 13:26:07,294][06567] Updated weights for policy 0, policy_version 12400 (0.0017) [2023-09-22 13:26:07,294][06493] Updated weights for policy 1, policy_version 11840 (0.0017) [2023-09-22 13:26:11,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 6225920. Throughput: 0: 784.5, 1: 784.6. Samples: 1519613. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:26:11,154][05066] Avg episode reward: [(0, '8.840'), (1, '8.960')] [2023-09-22 13:26:16,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6254592. Throughput: 0: 784.5, 1: 784.9. Samples: 1528902. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:26:16,154][05066] Avg episode reward: [(0, '8.930'), (1, '9.130')] [2023-09-22 13:26:20,355][06493] Updated weights for policy 1, policy_version 12000 (0.0018) [2023-09-22 13:26:20,356][06567] Updated weights for policy 0, policy_version 12560 (0.0015) [2023-09-22 13:26:21,154][05066] Fps is (10 sec: 6143.9, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 6287360. Throughput: 0: 783.2, 1: 782.8. Samples: 1533729. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:26:21,154][05066] Avg episode reward: [(0, '9.560'), (1, '8.380')] [2023-09-22 13:26:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 6320128. Throughput: 0: 784.4, 1: 783.7. Samples: 1543085. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:26:26,154][05066] Avg episode reward: [(0, '9.010'), (1, '8.990')] [2023-09-22 13:26:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6352896. Throughput: 0: 779.9, 1: 780.4. Samples: 1552384. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:26:31,154][05066] Avg episode reward: [(0, '8.960'), (1, '9.280')] [2023-09-22 13:26:33,380][06567] Updated weights for policy 0, policy_version 12720 (0.0016) [2023-09-22 13:26:33,381][06493] Updated weights for policy 1, policy_version 12160 (0.0014) [2023-09-22 13:26:36,153][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6385664. Throughput: 0: 780.8, 1: 780.7. Samples: 1557090. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:26:36,154][05066] Avg episode reward: [(0, '9.630'), (1, '9.330')] [2023-09-22 13:26:41,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6410240. Throughput: 0: 785.6, 1: 784.7. Samples: 1566644. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:26:41,155][05066] Avg episode reward: [(0, '8.840'), (1, '9.820')] [2023-09-22 13:26:41,317][06278] Saving new best policy, reward=9.820! [2023-09-22 13:26:46,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6443008. Throughput: 0: 782.5, 1: 782.6. Samples: 1575804. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:26:46,155][05066] Avg episode reward: [(0, '9.440'), (1, '9.450')] [2023-09-22 13:26:46,644][06493] Updated weights for policy 1, policy_version 12320 (0.0015) [2023-09-22 13:26:46,644][06567] Updated weights for policy 0, policy_version 12880 (0.0016) [2023-09-22 13:26:51,154][05066] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6475776. Throughput: 0: 780.2, 1: 781.7. Samples: 1580413. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:26:51,154][05066] Avg episode reward: [(0, '9.550'), (1, '8.990')] [2023-09-22 13:26:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6508544. Throughput: 0: 777.8, 1: 776.9. Samples: 1589576. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:26:56,155][05066] Avg episode reward: [(0, '9.650'), (1, '9.110')] [2023-09-22 13:26:59,745][06567] Updated weights for policy 0, policy_version 13040 (0.0016) [2023-09-22 13:26:59,745][06493] Updated weights for policy 1, policy_version 12480 (0.0017) [2023-09-22 13:27:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6541312. Throughput: 0: 783.3, 1: 782.5. Samples: 1599364. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:27:01,154][05066] Avg episode reward: [(0, '9.090'), (1, '9.160')] [2023-09-22 13:27:06,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6565888. Throughput: 0: 778.0, 1: 778.4. Samples: 1603768. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:27:06,155][05066] Avg episode reward: [(0, '9.460'), (1, '9.260')] [2023-09-22 13:27:11,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6248.1). Total num frames: 6598656. Throughput: 0: 776.9, 1: 776.6. Samples: 1612994. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:27:11,154][05066] Avg episode reward: [(0, '9.170'), (1, '9.270')] [2023-09-22 13:27:13,034][06567] Updated weights for policy 0, policy_version 13200 (0.0015) [2023-09-22 13:27:13,036][06493] Updated weights for policy 1, policy_version 12640 (0.0014) [2023-09-22 13:27:16,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 6631424. Throughput: 0: 775.8, 1: 775.2. Samples: 1622175. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:27:16,155][05066] Avg episode reward: [(0, '9.440'), (1, '9.740')] [2023-09-22 13:27:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6664192. Throughput: 0: 777.6, 1: 777.2. Samples: 1627055. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:27:21,155][05066] Avg episode reward: [(0, '9.420'), (1, '9.300')] [2023-09-22 13:27:26,045][06567] Updated weights for policy 0, policy_version 13360 (0.0019) [2023-09-22 13:27:26,046][06493] Updated weights for policy 1, policy_version 12800 (0.0019) [2023-09-22 13:27:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6696960. Throughput: 0: 773.8, 1: 775.4. Samples: 1636358. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:27:26,154][05066] Avg episode reward: [(0, '10.190'), (1, '9.690')] [2023-09-22 13:27:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6721536. Throughput: 0: 778.4, 1: 778.7. Samples: 1645875. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:27:31,155][05066] Avg episode reward: [(0, '10.050'), (1, '9.800')] [2023-09-22 13:27:36,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6754304. Throughput: 0: 781.3, 1: 780.4. Samples: 1650688. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:27:36,155][05066] Avg episode reward: [(0, '9.930'), (1, '9.800')] [2023-09-22 13:27:39,054][06493] Updated weights for policy 1, policy_version 12960 (0.0016) [2023-09-22 13:27:39,054][06567] Updated weights for policy 0, policy_version 13520 (0.0018) [2023-09-22 13:27:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6787072. Throughput: 0: 783.9, 1: 783.4. Samples: 1660105. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:27:41,155][05066] Avg episode reward: [(0, '10.180'), (1, '9.830')] [2023-09-22 13:27:41,165][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000013536_3465216.pth... [2023-09-22 13:27:41,165][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000012976_3321856.pth... [2023-09-22 13:27:41,201][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000010608_2715648.pth [2023-09-22 13:27:41,201][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000010048_2572288.pth [2023-09-22 13:27:41,205][06278] Saving new best policy, reward=9.830! [2023-09-22 13:27:46,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 6819840. Throughput: 0: 776.1, 1: 776.4. Samples: 1669227. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:27:46,154][05066] Avg episode reward: [(0, '9.930'), (1, '9.810')] [2023-09-22 13:27:51,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6852608. Throughput: 0: 782.3, 1: 782.6. Samples: 1674186. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:27:51,154][05066] Avg episode reward: [(0, '9.680'), (1, '9.740')] [2023-09-22 13:27:52,090][06567] Updated weights for policy 0, policy_version 13680 (0.0014) [2023-09-22 13:27:52,091][06493] Updated weights for policy 1, policy_version 13120 (0.0014) [2023-09-22 13:27:56,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6885376. Throughput: 0: 782.6, 1: 783.5. Samples: 1683466. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:27:56,155][05066] Avg episode reward: [(0, '9.730'), (1, '9.650')] [2023-09-22 13:28:01,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 6909952. Throughput: 0: 785.8, 1: 786.8. Samples: 1692944. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:28:01,154][05066] Avg episode reward: [(0, '9.860'), (1, '9.820')] [2023-09-22 13:28:05,161][06567] Updated weights for policy 0, policy_version 13840 (0.0017) [2023-09-22 13:28:05,161][06493] Updated weights for policy 1, policy_version 13280 (0.0016) [2023-09-22 13:28:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 6942720. Throughput: 0: 785.5, 1: 786.0. Samples: 1697773. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:28:06,155][05066] Avg episode reward: [(0, '10.460'), (1, '9.040')] [2023-09-22 13:28:06,156][06078] Saving new best policy, reward=10.460! [2023-09-22 13:28:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 6975488. Throughput: 0: 785.8, 1: 787.3. Samples: 1707146. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:28:11,154][05066] Avg episode reward: [(0, '10.100'), (1, '9.120')] [2023-09-22 13:28:16,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7008256. Throughput: 0: 782.6, 1: 782.3. Samples: 1716296. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:28:16,154][05066] Avg episode reward: [(0, '10.240'), (1, '9.590')] [2023-09-22 13:28:18,322][06493] Updated weights for policy 1, policy_version 13440 (0.0016) [2023-09-22 13:28:18,322][06567] Updated weights for policy 0, policy_version 14000 (0.0014) [2023-09-22 13:28:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7041024. Throughput: 0: 781.7, 1: 781.3. Samples: 1721020. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:28:21,155][05066] Avg episode reward: [(0, '10.620'), (1, '9.410')] [2023-09-22 13:28:21,156][06078] Saving new best policy, reward=10.620! [2023-09-22 13:28:26,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7065600. Throughput: 0: 782.2, 1: 783.5. Samples: 1730560. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:28:26,155][05066] Avg episode reward: [(0, '10.790'), (1, '10.130')] [2023-09-22 13:28:26,192][06278] Saving new best policy, reward=10.130! [2023-09-22 13:28:26,216][06078] Saving new best policy, reward=10.790! [2023-09-22 13:28:31,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7098368. Throughput: 0: 784.4, 1: 783.7. Samples: 1739790. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:28:31,154][05066] Avg episode reward: [(0, '10.310'), (1, '10.280')] [2023-09-22 13:28:31,155][06278] Saving new best policy, reward=10.280! [2023-09-22 13:28:31,493][06567] Updated weights for policy 0, policy_version 14160 (0.0016) [2023-09-22 13:28:31,493][06493] Updated weights for policy 1, policy_version 13600 (0.0018) [2023-09-22 13:28:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7131136. Throughput: 0: 782.4, 1: 782.6. Samples: 1744610. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:28:36,155][05066] Avg episode reward: [(0, '10.500'), (1, '10.510')] [2023-09-22 13:28:36,157][06278] Saving new best policy, reward=10.510! [2023-09-22 13:28:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7163904. Throughput: 0: 780.0, 1: 779.1. Samples: 1753622. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:28:41,155][05066] Avg episode reward: [(0, '10.880'), (1, '10.560')] [2023-09-22 13:28:41,165][06278] Saving new best policy, reward=10.560! [2023-09-22 13:28:41,165][06078] Saving new best policy, reward=10.880! [2023-09-22 13:28:44,702][06493] Updated weights for policy 1, policy_version 13760 (0.0017) [2023-09-22 13:28:44,703][06567] Updated weights for policy 0, policy_version 14320 (0.0018) [2023-09-22 13:28:46,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7196672. Throughput: 0: 780.3, 1: 780.3. Samples: 1763172. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:28:46,154][05066] Avg episode reward: [(0, '10.990'), (1, '11.060')] [2023-09-22 13:28:46,155][06078] Saving new best policy, reward=10.990! [2023-09-22 13:28:46,155][06278] Saving new best policy, reward=11.060! [2023-09-22 13:28:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7221248. Throughput: 0: 775.0, 1: 775.0. Samples: 1767520. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:28:51,155][05066] Avg episode reward: [(0, '11.070'), (1, '11.410')] [2023-09-22 13:28:51,242][06278] Saving new best policy, reward=11.410! [2023-09-22 13:28:51,253][06078] Saving new best policy, reward=11.070! [2023-09-22 13:28:56,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7254016. Throughput: 0: 781.5, 1: 779.3. Samples: 1777382. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:28:56,155][05066] Avg episode reward: [(0, '10.030'), (1, '11.430')] [2023-09-22 13:28:56,167][06278] Saving new best policy, reward=11.430! [2023-09-22 13:28:57,739][06493] Updated weights for policy 1, policy_version 13920 (0.0015) [2023-09-22 13:28:57,739][06567] Updated weights for policy 0, policy_version 14480 (0.0017) [2023-09-22 13:29:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7286784. Throughput: 0: 780.3, 1: 780.1. Samples: 1786513. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:29:01,154][05066] Avg episode reward: [(0, '10.810'), (1, '11.780')] [2023-09-22 13:29:01,155][06278] Saving new best policy, reward=11.780! [2023-09-22 13:29:06,154][05066] Fps is (10 sec: 6553.9, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 7319552. Throughput: 0: 782.6, 1: 782.6. Samples: 1791456. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:29:06,154][05066] Avg episode reward: [(0, '10.390'), (1, '12.710')] [2023-09-22 13:29:06,155][06278] Saving new best policy, reward=12.710! [2023-09-22 13:29:10,653][06493] Updated weights for policy 1, policy_version 14080 (0.0016) [2023-09-22 13:29:10,653][06567] Updated weights for policy 0, policy_version 14640 (0.0016) [2023-09-22 13:29:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7352320. Throughput: 0: 781.3, 1: 780.2. Samples: 1800827. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:29:11,155][05066] Avg episode reward: [(0, '11.180'), (1, '12.490')] [2023-09-22 13:29:11,164][06078] Saving new best policy, reward=11.180! [2023-09-22 13:29:16,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7385088. Throughput: 0: 784.2, 1: 785.6. Samples: 1810432. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:29:16,155][05066] Avg episode reward: [(0, '11.010'), (1, '13.000')] [2023-09-22 13:29:16,156][06278] Saving new best policy, reward=13.000! [2023-09-22 13:29:21,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7409664. Throughput: 0: 781.0, 1: 780.1. Samples: 1814858. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:29:21,154][05066] Avg episode reward: [(0, '10.610'), (1, '11.920')] [2023-09-22 13:29:23,838][06493] Updated weights for policy 1, policy_version 14240 (0.0015) [2023-09-22 13:29:23,838][06567] Updated weights for policy 0, policy_version 14800 (0.0018) [2023-09-22 13:29:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7442432. Throughput: 0: 786.4, 1: 786.3. Samples: 1824393. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:29:26,155][05066] Avg episode reward: [(0, '10.400'), (1, '11.490')] [2023-09-22 13:29:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7475200. Throughput: 0: 783.3, 1: 782.6. Samples: 1833637. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:29:31,155][05066] Avg episode reward: [(0, '9.860'), (1, '10.730')] [2023-09-22 13:29:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7507968. Throughput: 0: 789.9, 1: 789.3. Samples: 1838585. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:29:36,154][05066] Avg episode reward: [(0, '10.320'), (1, '10.420')] [2023-09-22 13:29:36,831][06567] Updated weights for policy 0, policy_version 14960 (0.0013) [2023-09-22 13:29:36,831][06493] Updated weights for policy 1, policy_version 14400 (0.0017) [2023-09-22 13:29:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 7540736. Throughput: 0: 783.7, 1: 783.5. Samples: 1847904. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:29:41,154][05066] Avg episode reward: [(0, '10.350'), (1, '10.470')] [2023-09-22 13:29:41,163][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000015008_3842048.pth... [2023-09-22 13:29:41,163][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000014448_3698688.pth... [2023-09-22 13:29:41,201][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000012080_3092480.pth [2023-09-22 13:29:41,205][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000011520_2949120.pth [2023-09-22 13:29:46,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7573504. Throughput: 0: 788.8, 1: 788.6. Samples: 1857492. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:29:46,155][05066] Avg episode reward: [(0, '10.050'), (1, '11.230')] [2023-09-22 13:29:49,928][06567] Updated weights for policy 0, policy_version 15120 (0.0020) [2023-09-22 13:29:49,928][06493] Updated weights for policy 1, policy_version 14560 (0.0019) [2023-09-22 13:29:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7598080. Throughput: 0: 783.2, 1: 782.9. Samples: 1861933. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:29:51,154][05066] Avg episode reward: [(0, '9.710'), (1, '11.450')] [2023-09-22 13:29:56,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7630848. Throughput: 0: 785.4, 1: 786.3. Samples: 1871553. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:29:56,155][05066] Avg episode reward: [(0, '10.450'), (1, '10.480')] [2023-09-22 13:30:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7663616. Throughput: 0: 782.5, 1: 781.1. Samples: 1880797. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:30:01,154][05066] Avg episode reward: [(0, '11.040'), (1, '11.210')] [2023-09-22 13:30:02,924][06493] Updated weights for policy 1, policy_version 14720 (0.0016) [2023-09-22 13:30:02,924][06567] Updated weights for policy 0, policy_version 15280 (0.0016) [2023-09-22 13:30:06,154][05066] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7696384. Throughput: 0: 787.4, 1: 787.3. Samples: 1885719. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:30:06,154][05066] Avg episode reward: [(0, '10.610'), (1, '10.620')] [2023-09-22 13:30:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 7729152. Throughput: 0: 783.8, 1: 784.1. Samples: 1894947. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:30:11,154][05066] Avg episode reward: [(0, '11.700'), (1, '10.210')] [2023-09-22 13:30:11,164][06078] Saving new best policy, reward=11.700! [2023-09-22 13:30:16,154][05066] Fps is (10 sec: 6143.8, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 7757824. Throughput: 0: 786.1, 1: 784.9. Samples: 1904332. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:30:16,155][05066] Avg episode reward: [(0, '11.150'), (1, '9.680')] [2023-09-22 13:30:16,160][06493] Updated weights for policy 1, policy_version 14880 (0.0016) [2023-09-22 13:30:16,160][06567] Updated weights for policy 0, policy_version 15440 (0.0016) [2023-09-22 13:30:21,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7786496. Throughput: 0: 779.0, 1: 780.0. Samples: 1908737. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:30:21,155][05066] Avg episode reward: [(0, '12.100'), (1, '10.520')] [2023-09-22 13:30:21,156][06078] Saving new best policy, reward=12.100! [2023-09-22 13:30:26,154][05066] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7819264. Throughput: 0: 779.5, 1: 780.2. Samples: 1918094. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:30:26,155][05066] Avg episode reward: [(0, '11.910'), (1, '10.120')] [2023-09-22 13:30:29,312][06567] Updated weights for policy 0, policy_version 15600 (0.0014) [2023-09-22 13:30:29,313][06493] Updated weights for policy 1, policy_version 15040 (0.0016) [2023-09-22 13:30:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 7852032. Throughput: 0: 774.8, 1: 775.0. Samples: 1927232. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:30:31,154][05066] Avg episode reward: [(0, '11.920'), (1, '10.080')] [2023-09-22 13:30:36,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7884800. Throughput: 0: 778.8, 1: 779.2. Samples: 1932043. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:30:36,154][05066] Avg episode reward: [(0, '11.900'), (1, '10.140')] [2023-09-22 13:30:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 7917568. Throughput: 0: 777.6, 1: 777.4. Samples: 1941529. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:30:41,155][05066] Avg episode reward: [(0, '11.050'), (1, '10.220')] [2023-09-22 13:30:42,274][06567] Updated weights for policy 0, policy_version 15760 (0.0014) [2023-09-22 13:30:42,274][06493] Updated weights for policy 1, policy_version 15200 (0.0016) [2023-09-22 13:30:46,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 7942144. Throughput: 0: 782.9, 1: 783.6. Samples: 1951289. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:30:46,154][05066] Avg episode reward: [(0, '10.810'), (1, '9.270')] [2023-09-22 13:30:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 7974912. Throughput: 0: 778.6, 1: 779.7. Samples: 1955841. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:30:51,154][05066] Avg episode reward: [(0, '11.000'), (1, '8.280')] [2023-09-22 13:30:55,290][06493] Updated weights for policy 1, policy_version 15360 (0.0015) [2023-09-22 13:30:55,291][06567] Updated weights for policy 0, policy_version 15920 (0.0017) [2023-09-22 13:30:56,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 8007680. Throughput: 0: 783.1, 1: 782.6. Samples: 1965402. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:30:56,155][05066] Avg episode reward: [(0, '10.280'), (1, '8.510')] [2023-09-22 13:31:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8040448. Throughput: 0: 780.7, 1: 782.0. Samples: 1974654. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:31:01,154][05066] Avg episode reward: [(0, '10.130'), (1, '8.680')] [2023-09-22 13:31:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6262.0). Total num frames: 8073216. Throughput: 0: 787.9, 1: 787.0. Samples: 1979605. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:06,155][05066] Avg episode reward: [(0, '10.430'), (1, '8.500')] [2023-09-22 13:31:08,142][06493] Updated weights for policy 1, policy_version 15520 (0.0016) [2023-09-22 13:31:08,143][06567] Updated weights for policy 0, policy_version 16080 (0.0017) [2023-09-22 13:31:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 8105984. Throughput: 0: 789.7, 1: 789.4. Samples: 1989155. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:11,155][05066] Avg episode reward: [(0, '10.720'), (1, '9.500')] [2023-09-22 13:31:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6275.9). Total num frames: 8138752. Throughput: 0: 795.3, 1: 796.1. Samples: 1998848. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:31:16,154][05066] Avg episode reward: [(0, '9.880'), (1, '10.340')] [2023-09-22 13:31:21,082][06567] Updated weights for policy 0, policy_version 16240 (0.0015) [2023-09-22 13:31:21,082][06493] Updated weights for policy 1, policy_version 15680 (0.0017) [2023-09-22 13:31:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 8171520. Throughput: 0: 793.5, 1: 793.4. Samples: 2003453. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:31:21,155][05066] Avg episode reward: [(0, '10.780'), (1, '10.640')] [2023-09-22 13:31:26,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8196096. Throughput: 0: 796.0, 1: 794.9. Samples: 2013118. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:31:26,155][05066] Avg episode reward: [(0, '11.630'), (1, '11.220')] [2023-09-22 13:31:31,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8228864. Throughput: 0: 789.5, 1: 789.2. Samples: 2022333. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:31,155][05066] Avg episode reward: [(0, '11.700'), (1, '12.490')] [2023-09-22 13:31:34,215][06567] Updated weights for policy 0, policy_version 16400 (0.0017) [2023-09-22 13:31:34,215][06493] Updated weights for policy 1, policy_version 15840 (0.0017) [2023-09-22 13:31:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 8261632. Throughput: 0: 790.7, 1: 790.2. Samples: 2026980. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:36,154][05066] Avg episode reward: [(0, '11.440'), (1, '12.150')] [2023-09-22 13:31:41,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 8294400. Throughput: 0: 784.7, 1: 784.8. Samples: 2036032. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:41,154][05066] Avg episode reward: [(0, '11.190'), (1, '10.720')] [2023-09-22 13:31:41,163][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000015920_4075520.pth... [2023-09-22 13:31:41,163][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000016480_4218880.pth... [2023-09-22 13:31:41,199][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000013536_3465216.pth [2023-09-22 13:31:41,205][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000012976_3321856.pth [2023-09-22 13:31:46,153][05066] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 8327168. Throughput: 0: 789.0, 1: 787.8. Samples: 2045607. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:46,154][05066] Avg episode reward: [(0, '10.990'), (1, '11.430')] [2023-09-22 13:31:47,436][06567] Updated weights for policy 0, policy_version 16560 (0.0017) [2023-09-22 13:31:47,436][06493] Updated weights for policy 1, policy_version 16000 (0.0015) [2023-09-22 13:31:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8351744. Throughput: 0: 782.7, 1: 783.3. Samples: 2050075. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:51,154][05066] Avg episode reward: [(0, '10.570'), (1, '10.420')] [2023-09-22 13:31:56,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8384512. Throughput: 0: 783.5, 1: 782.6. Samples: 2059628. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:31:56,156][05066] Avg episode reward: [(0, '10.810'), (1, '10.650')] [2023-09-22 13:32:00,436][06567] Updated weights for policy 0, policy_version 16720 (0.0017) [2023-09-22 13:32:00,437][06493] Updated weights for policy 1, policy_version 16160 (0.0013) [2023-09-22 13:32:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 8417280. Throughput: 0: 779.9, 1: 779.3. Samples: 2069011. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:01,155][05066] Avg episode reward: [(0, '11.140'), (1, '9.560')] [2023-09-22 13:32:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 8450048. Throughput: 0: 782.6, 1: 782.4. Samples: 2073878. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:06,155][05066] Avg episode reward: [(0, '11.010'), (1, '10.690')] [2023-09-22 13:32:11,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 8482816. Throughput: 0: 779.7, 1: 780.4. Samples: 2083320. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:11,154][05066] Avg episode reward: [(0, '11.140'), (1, '10.420')] [2023-09-22 13:32:13,427][06493] Updated weights for policy 1, policy_version 16320 (0.0013) [2023-09-22 13:32:13,428][06567] Updated weights for policy 0, policy_version 16880 (0.0016) [2023-09-22 13:32:16,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 8515584. Throughput: 0: 784.1, 1: 783.9. Samples: 2092896. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:16,154][05066] Avg episode reward: [(0, '10.850'), (1, '11.010')] [2023-09-22 13:32:21,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8540160. Throughput: 0: 779.6, 1: 780.0. Samples: 2097162. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:32:21,155][05066] Avg episode reward: [(0, '9.800'), (1, '11.150')] [2023-09-22 13:32:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 8572928. Throughput: 0: 782.9, 1: 783.4. Samples: 2106516. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:32:26,154][05066] Avg episode reward: [(0, '9.990'), (1, '11.550')] [2023-09-22 13:32:26,930][06493] Updated weights for policy 1, policy_version 16480 (0.0014) [2023-09-22 13:32:26,930][06567] Updated weights for policy 0, policy_version 17040 (0.0014) [2023-09-22 13:32:31,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 8605696. Throughput: 0: 776.8, 1: 778.3. Samples: 2115584. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:31,154][05066] Avg episode reward: [(0, '11.100'), (1, '12.160')] [2023-09-22 13:32:36,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8630272. Throughput: 0: 777.1, 1: 776.9. Samples: 2120006. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:36,155][05066] Avg episode reward: [(0, '11.340'), (1, '11.920')] [2023-09-22 13:32:40,141][06567] Updated weights for policy 0, policy_version 17200 (0.0016) [2023-09-22 13:32:40,141][06493] Updated weights for policy 1, policy_version 16640 (0.0016) [2023-09-22 13:32:41,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8663040. Throughput: 0: 776.4, 1: 777.9. Samples: 2129570. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:41,154][05066] Avg episode reward: [(0, '11.440'), (1, '11.330')] [2023-09-22 13:32:46,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8695808. Throughput: 0: 774.0, 1: 774.5. Samples: 2138690. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:32:46,154][05066] Avg episode reward: [(0, '11.730'), (1, '11.590')] [2023-09-22 13:32:51,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 8728576. Throughput: 0: 774.7, 1: 774.4. Samples: 2143585. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:32:51,155][05066] Avg episode reward: [(0, '13.380'), (1, '11.970')] [2023-09-22 13:32:51,157][06078] Saving new best policy, reward=13.380! [2023-09-22 13:32:53,348][06493] Updated weights for policy 1, policy_version 16800 (0.0017) [2023-09-22 13:32:53,348][06567] Updated weights for policy 0, policy_version 17360 (0.0017) [2023-09-22 13:32:56,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 8761344. Throughput: 0: 769.4, 1: 769.4. Samples: 2152564. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:32:56,155][05066] Avg episode reward: [(0, '13.820'), (1, '12.140')] [2023-09-22 13:32:56,165][06078] Saving new best policy, reward=13.820! [2023-09-22 13:33:01,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8785920. Throughput: 0: 767.8, 1: 769.2. Samples: 2162062. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:33:01,154][05066] Avg episode reward: [(0, '14.310'), (1, '11.740')] [2023-09-22 13:33:01,306][06078] Saving new best policy, reward=14.310! [2023-09-22 13:33:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8818688. Throughput: 0: 772.5, 1: 773.3. Samples: 2166725. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:33:06,154][05066] Avg episode reward: [(0, '13.590'), (1, '11.970')] [2023-09-22 13:33:06,656][06567] Updated weights for policy 0, policy_version 17520 (0.0016) [2023-09-22 13:33:06,656][06493] Updated weights for policy 1, policy_version 16960 (0.0018) [2023-09-22 13:33:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8851456. Throughput: 0: 770.8, 1: 770.2. Samples: 2175864. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:33:11,154][05066] Avg episode reward: [(0, '13.080'), (1, '11.960')] [2023-09-22 13:33:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8884224. Throughput: 0: 773.8, 1: 773.7. Samples: 2185220. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:33:16,154][05066] Avg episode reward: [(0, '12.820'), (1, '11.280')] [2023-09-22 13:33:19,828][06567] Updated weights for policy 0, policy_version 17680 (0.0016) [2023-09-22 13:33:19,830][06493] Updated weights for policy 1, policy_version 17120 (0.0016) [2023-09-22 13:33:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 8916992. Throughput: 0: 776.5, 1: 775.5. Samples: 2189846. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:33:21,154][05066] Avg episode reward: [(0, '13.780'), (1, '11.580')] [2023-09-22 13:33:26,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8941568. Throughput: 0: 775.8, 1: 774.5. Samples: 2199335. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:33:26,155][05066] Avg episode reward: [(0, '12.920'), (1, '11.310')] [2023-09-22 13:33:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 8974336. Throughput: 0: 776.4, 1: 776.0. Samples: 2208546. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:33:31,154][05066] Avg episode reward: [(0, '12.130'), (1, '11.620')] [2023-09-22 13:33:32,919][06567] Updated weights for policy 0, policy_version 17840 (0.0015) [2023-09-22 13:33:32,919][06493] Updated weights for policy 1, policy_version 17280 (0.0016) [2023-09-22 13:33:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9007104. Throughput: 0: 776.0, 1: 775.1. Samples: 2213386. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:33:36,155][05066] Avg episode reward: [(0, '12.320'), (1, '12.100')] [2023-09-22 13:33:41,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9039872. Throughput: 0: 778.5, 1: 778.6. Samples: 2222633. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:33:41,155][05066] Avg episode reward: [(0, '13.250'), (1, '11.470')] [2023-09-22 13:33:41,166][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000017376_4448256.pth... [2023-09-22 13:33:41,166][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000017936_4591616.pth... [2023-09-22 13:33:41,201][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000014448_3698688.pth [2023-09-22 13:33:41,201][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000015008_3842048.pth [2023-09-22 13:33:45,976][06493] Updated weights for policy 1, policy_version 17440 (0.0016) [2023-09-22 13:33:45,977][06567] Updated weights for policy 0, policy_version 18000 (0.0018) [2023-09-22 13:33:46,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 9072640. Throughput: 0: 780.3, 1: 779.9. Samples: 2232272. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:33:46,154][05066] Avg episode reward: [(0, '12.750'), (1, '11.420')] [2023-09-22 13:33:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 9097216. Throughput: 0: 776.2, 1: 774.9. Samples: 2236524. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:33:51,154][05066] Avg episode reward: [(0, '11.500'), (1, '11.090')] [2023-09-22 13:33:56,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 9129984. Throughput: 0: 776.9, 1: 777.7. Samples: 2245819. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:33:56,155][05066] Avg episode reward: [(0, '11.640'), (1, '11.350')] [2023-09-22 13:33:59,479][06493] Updated weights for policy 1, policy_version 17600 (0.0017) [2023-09-22 13:33:59,479][06567] Updated weights for policy 0, policy_version 18160 (0.0018) [2023-09-22 13:34:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9162752. Throughput: 0: 773.7, 1: 773.8. Samples: 2254857. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:34:01,154][05066] Avg episode reward: [(0, '11.630'), (1, '10.960')] [2023-09-22 13:34:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9195520. Throughput: 0: 774.8, 1: 775.6. Samples: 2259610. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:34:06,155][05066] Avg episode reward: [(0, '11.290'), (1, '11.830')] [2023-09-22 13:34:11,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9228288. Throughput: 0: 775.5, 1: 776.7. Samples: 2269185. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:34:11,155][05066] Avg episode reward: [(0, '11.420'), (1, '11.750')] [2023-09-22 13:34:12,332][06567] Updated weights for policy 0, policy_version 18320 (0.0015) [2023-09-22 13:34:12,332][06493] Updated weights for policy 1, policy_version 17760 (0.0017) [2023-09-22 13:34:16,155][05066] Fps is (10 sec: 6552.8, 60 sec: 6280.4, 300 sec: 6275.9). Total num frames: 9261056. Throughput: 0: 783.7, 1: 783.5. Samples: 2279073. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:34:16,156][05066] Avg episode reward: [(0, '11.240'), (1, '11.870')] [2023-09-22 13:34:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 9285632. Throughput: 0: 778.4, 1: 780.2. Samples: 2283525. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:34:21,155][05066] Avg episode reward: [(0, '12.120'), (1, '12.170')] [2023-09-22 13:34:25,366][06493] Updated weights for policy 1, policy_version 17920 (0.0015) [2023-09-22 13:34:25,368][06567] Updated weights for policy 0, policy_version 18480 (0.0018) [2023-09-22 13:34:26,154][05066] Fps is (10 sec: 5735.1, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9318400. Throughput: 0: 780.5, 1: 780.6. Samples: 2292881. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 13:34:26,155][05066] Avg episode reward: [(0, '12.150'), (1, '12.330')] [2023-09-22 13:34:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9351168. Throughput: 0: 776.5, 1: 776.0. Samples: 2302136. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:34:31,155][05066] Avg episode reward: [(0, '12.910'), (1, '12.660')] [2023-09-22 13:34:36,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 9383936. Throughput: 0: 782.6, 1: 782.6. Samples: 2306956. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:34:36,154][05066] Avg episode reward: [(0, '12.930'), (1, '12.260')] [2023-09-22 13:34:38,379][06567] Updated weights for policy 0, policy_version 18640 (0.0015) [2023-09-22 13:34:38,380][06493] Updated weights for policy 1, policy_version 18080 (0.0017) [2023-09-22 13:34:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 9416704. Throughput: 0: 784.2, 1: 783.8. Samples: 2316376. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:34:41,154][05066] Avg episode reward: [(0, '12.850'), (1, '13.360')] [2023-09-22 13:34:41,166][06278] Saving new best policy, reward=13.360! [2023-09-22 13:34:46,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 9449472. Throughput: 0: 793.7, 1: 792.5. Samples: 2326238. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:34:46,155][05066] Avg episode reward: [(0, '13.140'), (1, '13.370')] [2023-09-22 13:34:46,156][06278] Saving new best policy, reward=13.370! [2023-09-22 13:34:51,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9474048. Throughput: 0: 790.4, 1: 790.3. Samples: 2330741. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:34:51,155][05066] Avg episode reward: [(0, '13.560'), (1, '13.870')] [2023-09-22 13:34:51,248][06278] Saving new best policy, reward=13.870! [2023-09-22 13:34:51,273][06567] Updated weights for policy 0, policy_version 18800 (0.0018) [2023-09-22 13:34:51,273][06493] Updated weights for policy 1, policy_version 18240 (0.0016) [2023-09-22 13:34:56,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9506816. Throughput: 0: 792.4, 1: 792.0. Samples: 2340480. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:34:56,155][05066] Avg episode reward: [(0, '12.820'), (1, '13.910')] [2023-09-22 13:34:56,167][06278] Saving new best policy, reward=13.910! [2023-09-22 13:35:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9539584. Throughput: 0: 783.2, 1: 783.0. Samples: 2349549. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:01,154][05066] Avg episode reward: [(0, '13.160'), (1, '14.220')] [2023-09-22 13:35:01,155][06278] Saving new best policy, reward=14.220! [2023-09-22 13:35:04,320][06493] Updated weights for policy 1, policy_version 18400 (0.0016) [2023-09-22 13:35:04,320][06567] Updated weights for policy 0, policy_version 18960 (0.0019) [2023-09-22 13:35:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9572352. Throughput: 0: 788.0, 1: 787.8. Samples: 2354436. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:06,155][05066] Avg episode reward: [(0, '13.740'), (1, '14.010')] [2023-09-22 13:35:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6262.0). Total num frames: 9605120. Throughput: 0: 786.2, 1: 786.2. Samples: 2363642. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:11,155][05066] Avg episode reward: [(0, '13.800'), (1, '14.170')] [2023-09-22 13:35:16,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.7, 300 sec: 6275.9). Total num frames: 9637888. Throughput: 0: 792.3, 1: 792.6. Samples: 2373455. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:35:16,154][05066] Avg episode reward: [(0, '14.350'), (1, '14.650')] [2023-09-22 13:35:16,155][06078] Saving new best policy, reward=14.350! [2023-09-22 13:35:16,155][06278] Saving new best policy, reward=14.650! [2023-09-22 13:35:17,340][06567] Updated weights for policy 0, policy_version 19120 (0.0019) [2023-09-22 13:35:17,340][06493] Updated weights for policy 1, policy_version 18560 (0.0015) [2023-09-22 13:35:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9662464. Throughput: 0: 787.2, 1: 787.0. Samples: 2377797. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:35:21,155][05066] Avg episode reward: [(0, '13.950'), (1, '14.310')] [2023-09-22 13:35:26,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9695232. Throughput: 0: 791.3, 1: 790.5. Samples: 2387559. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:35:26,155][05066] Avg episode reward: [(0, '13.850'), (1, '14.040')] [2023-09-22 13:35:30,362][06567] Updated weights for policy 0, policy_version 19280 (0.0017) [2023-09-22 13:35:30,363][06493] Updated weights for policy 1, policy_version 18720 (0.0018) [2023-09-22 13:35:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9728000. Throughput: 0: 783.8, 1: 784.6. Samples: 2396817. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:31,155][05066] Avg episode reward: [(0, '14.460'), (1, '14.490')] [2023-09-22 13:35:31,156][06078] Saving new best policy, reward=14.460! [2023-09-22 13:35:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9760768. Throughput: 0: 786.3, 1: 786.8. Samples: 2401530. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:36,154][05066] Avg episode reward: [(0, '14.190'), (1, '13.410')] [2023-09-22 13:35:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 9793536. Throughput: 0: 779.4, 1: 778.8. Samples: 2410599. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:41,154][05066] Avg episode reward: [(0, '14.280'), (1, '12.890')] [2023-09-22 13:35:41,162][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000019408_4968448.pth... [2023-09-22 13:35:41,162][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000018848_4825088.pth... [2023-09-22 13:35:41,191][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000016480_4218880.pth [2023-09-22 13:35:41,197][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000015920_4075520.pth [2023-09-22 13:35:43,506][06567] Updated weights for policy 0, policy_version 19440 (0.0015) [2023-09-22 13:35:43,507][06493] Updated weights for policy 1, policy_version 18880 (0.0016) [2023-09-22 13:35:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 9826304. Throughput: 0: 788.7, 1: 787.7. Samples: 2420484. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:46,155][05066] Avg episode reward: [(0, '14.370'), (1, '13.310')] [2023-09-22 13:35:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 9850880. Throughput: 0: 782.1, 1: 782.3. Samples: 2424832. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:51,154][05066] Avg episode reward: [(0, '14.120'), (1, '13.380')] [2023-09-22 13:35:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9883648. Throughput: 0: 783.4, 1: 783.0. Samples: 2434126. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:35:56,155][05066] Avg episode reward: [(0, '13.280'), (1, '13.350')] [2023-09-22 13:35:56,764][06567] Updated weights for policy 0, policy_version 19600 (0.0018) [2023-09-22 13:35:56,764][06493] Updated weights for policy 1, policy_version 19040 (0.0016) [2023-09-22 13:36:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9916416. Throughput: 0: 775.9, 1: 775.9. Samples: 2443286. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:01,154][05066] Avg episode reward: [(0, '13.510'), (1, '13.500')] [2023-09-22 13:36:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9949184. Throughput: 0: 779.9, 1: 779.9. Samples: 2447986. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:06,154][05066] Avg episode reward: [(0, '13.560'), (1, '13.840')] [2023-09-22 13:36:09,790][06567] Updated weights for policy 0, policy_version 19760 (0.0015) [2023-09-22 13:36:09,790][06493] Updated weights for policy 1, policy_version 19200 (0.0016) [2023-09-22 13:36:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 9981952. Throughput: 0: 777.5, 1: 779.0. Samples: 2457600. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:11,155][05066] Avg episode reward: [(0, '13.630'), (1, '13.660')] [2023-09-22 13:36:16,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10006528. Throughput: 0: 784.0, 1: 784.0. Samples: 2467381. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:16,155][05066] Avg episode reward: [(0, '12.610'), (1, '13.670')] [2023-09-22 13:36:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10039296. Throughput: 0: 782.1, 1: 782.5. Samples: 2471936. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:21,155][05066] Avg episode reward: [(0, '12.120'), (1, '13.740')] [2023-09-22 13:36:22,761][06567] Updated weights for policy 0, policy_version 19920 (0.0015) [2023-09-22 13:36:22,761][06493] Updated weights for policy 1, policy_version 19360 (0.0017) [2023-09-22 13:36:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 10072064. Throughput: 0: 785.6, 1: 786.0. Samples: 2481321. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:36:26,154][05066] Avg episode reward: [(0, '11.330'), (1, '13.300')] [2023-09-22 13:36:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10104832. Throughput: 0: 776.8, 1: 778.0. Samples: 2490454. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:36:31,154][05066] Avg episode reward: [(0, '10.740'), (1, '14.080')] [2023-09-22 13:36:35,917][06567] Updated weights for policy 0, policy_version 20080 (0.0017) [2023-09-22 13:36:35,918][06493] Updated weights for policy 1, policy_version 19520 (0.0017) [2023-09-22 13:36:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10137600. Throughput: 0: 783.0, 1: 782.4. Samples: 2495275. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:36:36,155][05066] Avg episode reward: [(0, '10.130'), (1, '13.850')] [2023-09-22 13:36:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10170368. Throughput: 0: 783.8, 1: 784.6. Samples: 2504704. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:41,155][05066] Avg episode reward: [(0, '10.310'), (1, '13.390')] [2023-09-22 13:36:46,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 10194944. Throughput: 0: 786.5, 1: 786.5. Samples: 2514071. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:46,154][05066] Avg episode reward: [(0, '10.600'), (1, '14.630')] [2023-09-22 13:36:49,197][06567] Updated weights for policy 0, policy_version 20240 (0.0014) [2023-09-22 13:36:49,198][06493] Updated weights for policy 1, policy_version 19680 (0.0015) [2023-09-22 13:36:51,154][05066] Fps is (10 sec: 5734.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10227712. Throughput: 0: 784.6, 1: 785.3. Samples: 2518634. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:51,154][05066] Avg episode reward: [(0, '10.820'), (1, '14.360')] [2023-09-22 13:36:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 10260480. Throughput: 0: 780.7, 1: 779.9. Samples: 2527829. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:36:56,154][05066] Avg episode reward: [(0, '11.400'), (1, '13.810')] [2023-09-22 13:37:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10293248. Throughput: 0: 778.6, 1: 779.0. Samples: 2537472. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:37:01,154][05066] Avg episode reward: [(0, '11.850'), (1, '13.830')] [2023-09-22 13:37:02,241][06567] Updated weights for policy 0, policy_version 20400 (0.0017) [2023-09-22 13:37:02,242][06493] Updated weights for policy 1, policy_version 19840 (0.0014) [2023-09-22 13:37:06,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6234.3). Total num frames: 10321920. Throughput: 0: 778.1, 1: 777.4. Samples: 2541934. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:37:06,154][05066] Avg episode reward: [(0, '12.850'), (1, '13.360')] [2023-09-22 13:37:11,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6220.4). Total num frames: 10350592. Throughput: 0: 779.5, 1: 779.2. Samples: 2551463. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:37:11,154][05066] Avg episode reward: [(0, '13.460'), (1, '13.500')] [2023-09-22 13:37:15,331][06493] Updated weights for policy 1, policy_version 20000 (0.0017) [2023-09-22 13:37:15,331][06567] Updated weights for policy 0, policy_version 20560 (0.0017) [2023-09-22 13:37:16,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10383360. Throughput: 0: 781.1, 1: 780.9. Samples: 2560742. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:37:16,154][05066] Avg episode reward: [(0, '13.980'), (1, '14.060')] [2023-09-22 13:37:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 10416128. Throughput: 0: 782.5, 1: 781.4. Samples: 2565653. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 13:37:21,154][05066] Avg episode reward: [(0, '14.230'), (1, '13.390')] [2023-09-22 13:37:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10448896. Throughput: 0: 780.4, 1: 779.5. Samples: 2574897. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:37:26,154][05066] Avg episode reward: [(0, '13.790'), (1, '13.660')] [2023-09-22 13:37:28,337][06493] Updated weights for policy 1, policy_version 20160 (0.0019) [2023-09-22 13:37:28,337][06567] Updated weights for policy 0, policy_version 20720 (0.0020) [2023-09-22 13:37:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 10481664. Throughput: 0: 783.2, 1: 783.6. Samples: 2584576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:37:31,154][05066] Avg episode reward: [(0, '14.190'), (1, '14.270')] [2023-09-22 13:37:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 10514432. Throughput: 0: 784.2, 1: 783.0. Samples: 2589159. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:37:36,154][05066] Avg episode reward: [(0, '13.400'), (1, '13.720')] [2023-09-22 13:37:41,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 10539008. Throughput: 0: 789.4, 1: 788.5. Samples: 2598833. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:37:41,155][05066] Avg episode reward: [(0, '13.230'), (1, '12.800')] [2023-09-22 13:37:41,291][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000020880_5345280.pth... [2023-09-22 13:37:41,317][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000020320_5201920.pth... [2023-09-22 13:37:41,320][06493] Updated weights for policy 1, policy_version 20320 (0.0017) [2023-09-22 13:37:41,321][06567] Updated weights for policy 0, policy_version 20880 (0.0015) [2023-09-22 13:37:41,323][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000017936_4591616.pth [2023-09-22 13:37:41,347][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000017376_4448256.pth [2023-09-22 13:37:46,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10571776. Throughput: 0: 779.9, 1: 779.5. Samples: 2607645. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:37:46,155][05066] Avg episode reward: [(0, '13.050'), (1, '13.300')] [2023-09-22 13:37:51,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10604544. Throughput: 0: 784.1, 1: 784.2. Samples: 2612505. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:37:51,155][05066] Avg episode reward: [(0, '13.540'), (1, '13.710')] [2023-09-22 13:37:54,535][06567] Updated weights for policy 0, policy_version 21040 (0.0017) [2023-09-22 13:37:54,535][06493] Updated weights for policy 1, policy_version 20480 (0.0016) [2023-09-22 13:37:56,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 10637312. Throughput: 0: 780.8, 1: 781.4. Samples: 2621761. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:37:56,154][05066] Avg episode reward: [(0, '13.770'), (1, '13.320')] [2023-09-22 13:38:01,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 10665984. Throughput: 0: 783.9, 1: 783.9. Samples: 2631292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:38:01,155][05066] Avg episode reward: [(0, '13.630'), (1, '14.190')] [2023-09-22 13:38:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6248.1). Total num frames: 10694656. Throughput: 0: 778.4, 1: 780.0. Samples: 2635780. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:06,154][05066] Avg episode reward: [(0, '14.170'), (1, '13.610')] [2023-09-22 13:38:07,722][06493] Updated weights for policy 1, policy_version 20640 (0.0018) [2023-09-22 13:38:07,722][06567] Updated weights for policy 0, policy_version 21200 (0.0016) [2023-09-22 13:38:11,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10727424. Throughput: 0: 782.9, 1: 783.0. Samples: 2645361. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:11,154][05066] Avg episode reward: [(0, '14.670'), (1, '13.800')] [2023-09-22 13:38:11,162][06078] Saving new best policy, reward=14.670! [2023-09-22 13:38:16,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 10760192. Throughput: 0: 777.6, 1: 776.7. Samples: 2654519. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:16,155][05066] Avg episode reward: [(0, '14.380'), (1, '13.990')] [2023-09-22 13:38:20,876][06567] Updated weights for policy 0, policy_version 21360 (0.0016) [2023-09-22 13:38:20,877][06493] Updated weights for policy 1, policy_version 20800 (0.0016) [2023-09-22 13:38:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 10792960. Throughput: 0: 777.0, 1: 777.7. Samples: 2659121. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:21,155][05066] Avg episode reward: [(0, '14.200'), (1, '14.460')] [2023-09-22 13:38:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 10817536. Throughput: 0: 773.7, 1: 775.4. Samples: 2668544. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:26,155][05066] Avg episode reward: [(0, '14.790'), (1, '13.400')] [2023-09-22 13:38:26,172][06078] Saving new best policy, reward=14.790! [2023-09-22 13:38:31,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 10850304. Throughput: 0: 781.7, 1: 781.6. Samples: 2677993. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:31,154][05066] Avg episode reward: [(0, '15.100'), (1, '12.700')] [2023-09-22 13:38:31,155][06078] Saving new best policy, reward=15.100! [2023-09-22 13:38:34,029][06493] Updated weights for policy 1, policy_version 20960 (0.0014) [2023-09-22 13:38:34,030][06567] Updated weights for policy 0, policy_version 21520 (0.0018) [2023-09-22 13:38:36,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 10883072. Throughput: 0: 780.5, 1: 780.9. Samples: 2682766. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:36,154][05066] Avg episode reward: [(0, '14.610'), (1, '13.680')] [2023-09-22 13:38:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 10915840. Throughput: 0: 781.4, 1: 781.1. Samples: 2692074. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:41,155][05066] Avg episode reward: [(0, '14.360'), (1, '13.000')] [2023-09-22 13:38:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 10948608. Throughput: 0: 778.3, 1: 778.7. Samples: 2701357. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:46,154][05066] Avg episode reward: [(0, '13.590'), (1, '12.670')] [2023-09-22 13:38:47,033][06493] Updated weights for policy 1, policy_version 21120 (0.0015) [2023-09-22 13:38:47,033][06567] Updated weights for policy 0, policy_version 21680 (0.0014) [2023-09-22 13:38:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 10981376. Throughput: 0: 781.9, 1: 781.1. Samples: 2706113. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:51,155][05066] Avg episode reward: [(0, '14.360'), (1, '12.310')] [2023-09-22 13:38:56,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11014144. Throughput: 0: 780.6, 1: 781.4. Samples: 2715649. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:38:56,155][05066] Avg episode reward: [(0, '14.650'), (1, '13.240')] [2023-09-22 13:38:59,974][06493] Updated weights for policy 1, policy_version 21280 (0.0015) [2023-09-22 13:38:59,974][06567] Updated weights for policy 0, policy_version 21840 (0.0014) [2023-09-22 13:39:01,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6248.1). Total num frames: 11038720. Throughput: 0: 786.5, 1: 786.7. Samples: 2725315. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:39:01,154][05066] Avg episode reward: [(0, '14.530'), (1, '12.850')] [2023-09-22 13:39:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 11071488. Throughput: 0: 787.1, 1: 787.6. Samples: 2729984. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:39:06,155][05066] Avg episode reward: [(0, '14.180'), (1, '12.130')] [2023-09-22 13:39:11,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.2). Total num frames: 11104256. Throughput: 0: 786.0, 1: 785.5. Samples: 2739262. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:39:11,154][05066] Avg episode reward: [(0, '14.140'), (1, '12.060')] [2023-09-22 13:39:13,040][06567] Updated weights for policy 0, policy_version 22000 (0.0016) [2023-09-22 13:39:13,040][06493] Updated weights for policy 1, policy_version 21440 (0.0014) [2023-09-22 13:39:16,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 11137024. Throughput: 0: 785.2, 1: 785.0. Samples: 2748655. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:39:16,154][05066] Avg episode reward: [(0, '13.730'), (1, '11.900')] [2023-09-22 13:39:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11169792. Throughput: 0: 786.6, 1: 785.9. Samples: 2753531. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:39:21,155][05066] Avg episode reward: [(0, '14.290'), (1, '11.460')] [2023-09-22 13:39:26,020][06493] Updated weights for policy 1, policy_version 21600 (0.0016) [2023-09-22 13:39:26,021][06567] Updated weights for policy 0, policy_version 22160 (0.0014) [2023-09-22 13:39:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 11202560. Throughput: 0: 785.1, 1: 785.6. Samples: 2762757. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:39:26,155][05066] Avg episode reward: [(0, '13.220'), (1, '10.890')] [2023-09-22 13:39:31,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6348.8, 300 sec: 6262.0). Total num frames: 11231232. Throughput: 0: 792.5, 1: 791.3. Samples: 2772626. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:39:31,154][05066] Avg episode reward: [(0, '13.100'), (1, '11.180')] [2023-09-22 13:39:36,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 11259904. Throughput: 0: 788.3, 1: 789.1. Samples: 2777094. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:39:36,155][05066] Avg episode reward: [(0, '12.690'), (1, '12.210')] [2023-09-22 13:39:38,867][06493] Updated weights for policy 1, policy_version 21760 (0.0015) [2023-09-22 13:39:38,867][06567] Updated weights for policy 0, policy_version 22320 (0.0013) [2023-09-22 13:39:41,154][05066] Fps is (10 sec: 6143.8, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 11292672. Throughput: 0: 792.3, 1: 791.2. Samples: 2786905. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:39:41,155][05066] Avg episode reward: [(0, '13.590'), (1, '11.870')] [2023-09-22 13:39:41,166][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000021776_5574656.pth... [2023-09-22 13:39:41,166][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000022336_5718016.pth... [2023-09-22 13:39:41,201][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000019408_4968448.pth [2023-09-22 13:39:41,207][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000018848_4825088.pth [2023-09-22 13:39:46,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11325440. Throughput: 0: 788.5, 1: 789.0. Samples: 2796302. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:39:46,154][05066] Avg episode reward: [(0, '13.770'), (1, '12.050')] [2023-09-22 13:39:51,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11358208. Throughput: 0: 789.5, 1: 789.7. Samples: 2801048. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:39:51,154][05066] Avg episode reward: [(0, '13.400'), (1, '12.900')] [2023-09-22 13:39:51,853][06493] Updated weights for policy 1, policy_version 21920 (0.0015) [2023-09-22 13:39:51,854][06567] Updated weights for policy 0, policy_version 22480 (0.0019) [2023-09-22 13:39:56,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11390976. Throughput: 0: 790.8, 1: 790.6. Samples: 2810423. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:39:56,154][05066] Avg episode reward: [(0, '13.550'), (1, '13.580')] [2023-09-22 13:40:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 11423744. Throughput: 0: 793.4, 1: 794.1. Samples: 2820096. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:40:01,154][05066] Avg episode reward: [(0, '13.920'), (1, '13.040')] [2023-09-22 13:40:04,674][06493] Updated weights for policy 1, policy_version 22080 (0.0019) [2023-09-22 13:40:04,674][06567] Updated weights for policy 0, policy_version 22640 (0.0018) [2023-09-22 13:40:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 11456512. Throughput: 0: 792.1, 1: 792.7. Samples: 2824846. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:40:06,155][05066] Avg episode reward: [(0, '13.980'), (1, '13.600')] [2023-09-22 13:40:11,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 11481088. Throughput: 0: 795.6, 1: 795.1. Samples: 2834342. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:40:11,155][05066] Avg episode reward: [(0, '14.870'), (1, '14.030')] [2023-09-22 13:40:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11513856. Throughput: 0: 788.3, 1: 789.1. Samples: 2843608. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:40:16,155][05066] Avg episode reward: [(0, '14.660'), (1, '13.590')] [2023-09-22 13:40:17,745][06567] Updated weights for policy 0, policy_version 22800 (0.0017) [2023-09-22 13:40:17,746][06493] Updated weights for policy 1, policy_version 22240 (0.0018) [2023-09-22 13:40:21,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11546624. Throughput: 0: 794.7, 1: 793.7. Samples: 2848570. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:40:21,154][05066] Avg episode reward: [(0, '15.310'), (1, '14.200')] [2023-09-22 13:40:21,155][06078] Saving new best policy, reward=15.310! [2023-09-22 13:40:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11579392. Throughput: 0: 788.4, 1: 788.8. Samples: 2857883. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:40:26,155][05066] Avg episode reward: [(0, '15.860'), (1, '13.770')] [2023-09-22 13:40:26,163][06078] Saving new best policy, reward=15.860! [2023-09-22 13:40:30,921][06493] Updated weights for policy 1, policy_version 22400 (0.0015) [2023-09-22 13:40:30,922][06567] Updated weights for policy 0, policy_version 22960 (0.0015) [2023-09-22 13:40:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6348.8, 300 sec: 6275.9). Total num frames: 11612160. Throughput: 0: 787.6, 1: 787.7. Samples: 2867192. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:40:31,155][05066] Avg episode reward: [(0, '15.820'), (1, '13.520')] [2023-09-22 13:40:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 11644928. Throughput: 0: 784.8, 1: 784.1. Samples: 2871652. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:40:36,154][05066] Avg episode reward: [(0, '15.260'), (1, '12.960')] [2023-09-22 13:40:41,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 11669504. Throughput: 0: 789.8, 1: 790.4. Samples: 2881534. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 13:40:41,154][05066] Avg episode reward: [(0, '13.970'), (1, '13.390')] [2023-09-22 13:40:43,733][06493] Updated weights for policy 1, policy_version 22560 (0.0016) [2023-09-22 13:40:43,733][06567] Updated weights for policy 0, policy_version 23120 (0.0013) [2023-09-22 13:40:46,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11702272. Throughput: 0: 788.3, 1: 788.1. Samples: 2891033. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:40:46,155][05066] Avg episode reward: [(0, '15.110'), (1, '12.910')] [2023-09-22 13:40:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11735040. Throughput: 0: 789.0, 1: 789.0. Samples: 2895856. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:40:51,154][05066] Avg episode reward: [(0, '13.670'), (1, '13.210')] [2023-09-22 13:40:56,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11767808. Throughput: 0: 786.2, 1: 786.2. Samples: 2905100. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:40:56,154][05066] Avg episode reward: [(0, '13.210'), (1, '14.240')] [2023-09-22 13:40:56,818][06567] Updated weights for policy 0, policy_version 23280 (0.0019) [2023-09-22 13:40:56,818][06493] Updated weights for policy 1, policy_version 22720 (0.0016) [2023-09-22 13:41:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11800576. Throughput: 0: 785.1, 1: 786.0. Samples: 2914309. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:41:01,154][05066] Avg episode reward: [(0, '13.520'), (1, '15.000')] [2023-09-22 13:41:01,155][06278] Saving new best policy, reward=15.000! [2023-09-22 13:41:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11833344. Throughput: 0: 783.1, 1: 783.3. Samples: 2919056. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:41:06,154][05066] Avg episode reward: [(0, '13.190'), (1, '14.180')] [2023-09-22 13:41:09,923][06493] Updated weights for policy 1, policy_version 22880 (0.0013) [2023-09-22 13:41:09,923][06567] Updated weights for policy 0, policy_version 23440 (0.0017) [2023-09-22 13:41:11,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 11857920. Throughput: 0: 785.8, 1: 786.5. Samples: 2928638. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:41:11,154][05066] Avg episode reward: [(0, '13.480'), (1, '14.210')] [2023-09-22 13:41:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11890688. Throughput: 0: 786.1, 1: 785.7. Samples: 2937922. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:41:16,154][05066] Avg episode reward: [(0, '13.530'), (1, '13.820')] [2023-09-22 13:41:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11923456. Throughput: 0: 791.6, 1: 791.5. Samples: 2942891. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:41:21,154][05066] Avg episode reward: [(0, '12.790'), (1, '13.960')] [2023-09-22 13:41:23,036][06493] Updated weights for policy 1, policy_version 23040 (0.0016) [2023-09-22 13:41:23,036][06567] Updated weights for policy 0, policy_version 23600 (0.0016) [2023-09-22 13:41:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11956224. Throughput: 0: 781.9, 1: 781.5. Samples: 2951887. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:41:26,155][05066] Avg episode reward: [(0, '13.310'), (1, '14.570')] [2023-09-22 13:41:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 11988992. Throughput: 0: 781.8, 1: 782.1. Samples: 2961408. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:41:31,154][05066] Avg episode reward: [(0, '13.560'), (1, '14.220')] [2023-09-22 13:41:36,118][06567] Updated weights for policy 0, policy_version 23760 (0.0017) [2023-09-22 13:41:36,118][06493] Updated weights for policy 1, policy_version 23200 (0.0015) [2023-09-22 13:41:36,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12021760. Throughput: 0: 778.4, 1: 778.3. Samples: 2965909. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:41:36,154][05066] Avg episode reward: [(0, '12.360'), (1, '13.180')] [2023-09-22 13:41:41,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12046336. Throughput: 0: 784.5, 1: 784.7. Samples: 2975713. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:41:41,155][05066] Avg episode reward: [(0, '13.550'), (1, '14.350')] [2023-09-22 13:41:41,268][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000023264_5955584.pth... [2023-09-22 13:41:41,299][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000023824_6098944.pth... [2023-09-22 13:41:41,301][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000020320_5201920.pth [2023-09-22 13:41:41,332][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000020880_5345280.pth [2023-09-22 13:41:46,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 12079104. Throughput: 0: 785.6, 1: 785.3. Samples: 2985002. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:41:46,154][05066] Avg episode reward: [(0, '12.930'), (1, '14.430')] [2023-09-22 13:41:48,971][06567] Updated weights for policy 0, policy_version 23920 (0.0017) [2023-09-22 13:41:48,972][06493] Updated weights for policy 1, policy_version 23360 (0.0014) [2023-09-22 13:41:51,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12111872. Throughput: 0: 788.5, 1: 789.0. Samples: 2990042. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:41:51,154][05066] Avg episode reward: [(0, '13.200'), (1, '14.710')] [2023-09-22 13:41:56,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12144640. Throughput: 0: 786.2, 1: 785.6. Samples: 2999371. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:41:56,155][05066] Avg episode reward: [(0, '13.060'), (1, '15.480')] [2023-09-22 13:41:56,164][06278] Saving new best policy, reward=15.480! [2023-09-22 13:42:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6289.8). Total num frames: 12177408. Throughput: 0: 784.4, 1: 784.7. Samples: 3008534. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:01,154][05066] Avg episode reward: [(0, '13.420'), (1, '15.270')] [2023-09-22 13:42:02,010][06493] Updated weights for policy 1, policy_version 23520 (0.0017) [2023-09-22 13:42:02,011][06567] Updated weights for policy 0, policy_version 24080 (0.0016) [2023-09-22 13:42:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12210176. Throughput: 0: 783.3, 1: 783.3. Samples: 3013390. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:42:06,155][05066] Avg episode reward: [(0, '13.430'), (1, '15.690')] [2023-09-22 13:42:06,156][06278] Saving new best policy, reward=15.690! [2023-09-22 13:42:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12242944. Throughput: 0: 788.2, 1: 788.7. Samples: 3022850. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:42:11,155][05066] Avg episode reward: [(0, '12.400'), (1, '15.260')] [2023-09-22 13:42:14,856][06567] Updated weights for policy 0, policy_version 24240 (0.0013) [2023-09-22 13:42:14,857][06493] Updated weights for policy 1, policy_version 23680 (0.0016) [2023-09-22 13:42:16,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6289.8). Total num frames: 12271616. Throughput: 0: 792.0, 1: 792.0. Samples: 3032690. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:42:16,155][05066] Avg episode reward: [(0, '12.830'), (1, '15.480')] [2023-09-22 13:42:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12300288. Throughput: 0: 792.0, 1: 792.3. Samples: 3037203. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:42:21,155][05066] Avg episode reward: [(0, '13.700'), (1, '15.840')] [2023-09-22 13:42:21,282][06278] Saving new best policy, reward=15.840! [2023-09-22 13:42:26,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 12333056. Throughput: 0: 793.3, 1: 792.3. Samples: 3047061. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:26,154][05066] Avg episode reward: [(0, '13.920'), (1, '15.810')] [2023-09-22 13:42:27,708][06567] Updated weights for policy 0, policy_version 24400 (0.0017) [2023-09-22 13:42:27,709][06493] Updated weights for policy 1, policy_version 23840 (0.0018) [2023-09-22 13:42:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12365824. Throughput: 0: 794.6, 1: 794.7. Samples: 3056517. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:31,155][05066] Avg episode reward: [(0, '13.880'), (1, '15.030')] [2023-09-22 13:42:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12398592. Throughput: 0: 792.5, 1: 791.0. Samples: 3061300. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:36,154][05066] Avg episode reward: [(0, '14.070'), (1, '14.580')] [2023-09-22 13:42:40,730][06493] Updated weights for policy 1, policy_version 24000 (0.0016) [2023-09-22 13:42:40,730][06567] Updated weights for policy 0, policy_version 24560 (0.0017) [2023-09-22 13:42:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12431360. Throughput: 0: 790.2, 1: 790.0. Samples: 3070482. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:41,155][05066] Avg episode reward: [(0, '14.050'), (1, '14.440')] [2023-09-22 13:42:46,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6303.7). Total num frames: 12464128. Throughput: 0: 793.8, 1: 792.8. Samples: 3079933. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:46,155][05066] Avg episode reward: [(0, '15.630'), (1, '14.420')] [2023-09-22 13:42:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12488704. Throughput: 0: 789.0, 1: 789.1. Samples: 3084406. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:51,154][05066] Avg episode reward: [(0, '15.190'), (1, '13.700')] [2023-09-22 13:42:53,862][06567] Updated weights for policy 0, policy_version 24720 (0.0016) [2023-09-22 13:42:53,863][06493] Updated weights for policy 1, policy_version 24160 (0.0016) [2023-09-22 13:42:56,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6289.8). Total num frames: 12521472. Throughput: 0: 792.8, 1: 792.0. Samples: 3094165. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:42:56,154][05066] Avg episode reward: [(0, '15.370'), (1, '13.460')] [2023-09-22 13:43:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12554240. Throughput: 0: 783.8, 1: 783.0. Samples: 3103199. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:43:01,155][05066] Avg episode reward: [(0, '15.880'), (1, '13.380')] [2023-09-22 13:43:01,155][06078] Saving new best policy, reward=15.880! [2023-09-22 13:43:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12587008. Throughput: 0: 780.8, 1: 780.0. Samples: 3107438. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:43:06,154][05066] Avg episode reward: [(0, '15.500'), (1, '12.820')] [2023-09-22 13:43:07,379][06567] Updated weights for policy 0, policy_version 24880 (0.0017) [2023-09-22 13:43:07,379][06493] Updated weights for policy 1, policy_version 24320 (0.0015) [2023-09-22 13:43:11,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 12611584. Throughput: 0: 775.7, 1: 776.1. Samples: 3116891. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:43:11,154][05066] Avg episode reward: [(0, '16.000'), (1, '12.810')] [2023-09-22 13:43:11,326][06078] Saving new best policy, reward=16.000! [2023-09-22 13:43:16,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6212.3, 300 sec: 6275.9). Total num frames: 12644352. Throughput: 0: 772.7, 1: 771.7. Samples: 3126015. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:43:16,155][05066] Avg episode reward: [(0, '15.790'), (1, '12.880')] [2023-09-22 13:43:20,513][06493] Updated weights for policy 1, policy_version 24480 (0.0015) [2023-09-22 13:43:20,514][06567] Updated weights for policy 0, policy_version 25040 (0.0016) [2023-09-22 13:43:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12677120. Throughput: 0: 773.7, 1: 774.9. Samples: 3130986. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:43:21,155][05066] Avg episode reward: [(0, '15.050'), (1, '13.240')] [2023-09-22 13:43:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12709888. Throughput: 0: 772.1, 1: 772.2. Samples: 3139976. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:43:26,154][05066] Avg episode reward: [(0, '15.640'), (1, '14.400')] [2023-09-22 13:43:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12742656. Throughput: 0: 775.9, 1: 777.2. Samples: 3149820. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:43:31,154][05066] Avg episode reward: [(0, '14.990'), (1, '14.170')] [2023-09-22 13:43:33,499][06493] Updated weights for policy 1, policy_version 24640 (0.0016) [2023-09-22 13:43:33,499][06567] Updated weights for policy 0, policy_version 25200 (0.0016) [2023-09-22 13:43:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12775424. Throughput: 0: 776.9, 1: 776.9. Samples: 3154329. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:43:36,155][05066] Avg episode reward: [(0, '14.400'), (1, '14.480')] [2023-09-22 13:43:41,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 12800000. Throughput: 0: 777.3, 1: 776.8. Samples: 3164102. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:43:41,154][05066] Avg episode reward: [(0, '14.990'), (1, '14.580')] [2023-09-22 13:43:41,267][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000025296_6475776.pth... [2023-09-22 13:43:41,293][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000022336_5718016.pth [2023-09-22 13:43:41,341][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000024736_6332416.pth... [2023-09-22 13:43:41,369][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000021776_5574656.pth [2023-09-22 13:43:46,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 12832768. Throughput: 0: 777.6, 1: 777.9. Samples: 3173194. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:43:46,154][05066] Avg episode reward: [(0, '14.600'), (1, '14.710')] [2023-09-22 13:43:46,626][06567] Updated weights for policy 0, policy_version 25360 (0.0015) [2023-09-22 13:43:46,627][06493] Updated weights for policy 1, policy_version 24800 (0.0015) [2023-09-22 13:43:51,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12865536. Throughput: 0: 783.4, 1: 783.1. Samples: 3177929. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:43:51,155][05066] Avg episode reward: [(0, '14.490'), (1, '15.490')] [2023-09-22 13:43:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12898304. Throughput: 0: 776.7, 1: 777.0. Samples: 3186808. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:43:56,154][05066] Avg episode reward: [(0, '15.240'), (1, '15.100')] [2023-09-22 13:43:59,879][06493] Updated weights for policy 1, policy_version 24960 (0.0016) [2023-09-22 13:43:59,879][06567] Updated weights for policy 0, policy_version 25520 (0.0017) [2023-09-22 13:44:01,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6289.8). Total num frames: 12926976. Throughput: 0: 783.4, 1: 783.3. Samples: 3196518. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 13:44:01,155][05066] Avg episode reward: [(0, '14.750'), (1, '15.300')] [2023-09-22 13:44:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 12955648. Throughput: 0: 777.9, 1: 778.5. Samples: 3201024. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:06,154][05066] Avg episode reward: [(0, '14.290'), (1, '15.050')] [2023-09-22 13:44:11,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 12988416. Throughput: 0: 784.7, 1: 783.0. Samples: 3210523. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:11,155][05066] Avg episode reward: [(0, '15.820'), (1, '15.360')] [2023-09-22 13:44:12,962][06567] Updated weights for policy 0, policy_version 25680 (0.0016) [2023-09-22 13:44:12,963][06493] Updated weights for policy 1, policy_version 25120 (0.0016) [2023-09-22 13:44:16,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13021184. Throughput: 0: 777.9, 1: 777.1. Samples: 3219795. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:16,155][05066] Avg episode reward: [(0, '15.150'), (1, '16.650')] [2023-09-22 13:44:16,156][06278] Saving new best policy, reward=16.650! [2023-09-22 13:44:21,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 13053952. Throughput: 0: 781.6, 1: 781.4. Samples: 3224667. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:21,154][05066] Avg episode reward: [(0, '14.770'), (1, '15.780')] [2023-09-22 13:44:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6262.0). Total num frames: 13078528. Throughput: 0: 773.7, 1: 775.0. Samples: 3233792. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:26,155][05066] Avg episode reward: [(0, '14.750'), (1, '14.340')] [2023-09-22 13:44:26,217][06567] Updated weights for policy 0, policy_version 25840 (0.0017) [2023-09-22 13:44:26,217][06493] Updated weights for policy 1, policy_version 25280 (0.0016) [2023-09-22 13:44:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 13111296. Throughput: 0: 778.7, 1: 778.9. Samples: 3243285. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:31,154][05066] Avg episode reward: [(0, '15.050'), (1, '15.050')] [2023-09-22 13:44:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 13144064. Throughput: 0: 778.6, 1: 779.5. Samples: 3248043. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:36,155][05066] Avg episode reward: [(0, '15.950'), (1, '14.340')] [2023-09-22 13:44:39,104][06493] Updated weights for policy 1, policy_version 25440 (0.0017) [2023-09-22 13:44:39,104][06567] Updated weights for policy 0, policy_version 26000 (0.0016) [2023-09-22 13:44:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13176832. Throughput: 0: 785.2, 1: 785.1. Samples: 3257468. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:44:41,154][05066] Avg episode reward: [(0, '14.560'), (1, '14.720')] [2023-09-22 13:44:46,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13209600. Throughput: 0: 781.2, 1: 781.9. Samples: 3266856. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:44:46,154][05066] Avg episode reward: [(0, '14.210'), (1, '14.150')] [2023-09-22 13:44:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 13242368. Throughput: 0: 787.6, 1: 786.7. Samples: 3271868. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:44:51,154][05066] Avg episode reward: [(0, '14.340'), (1, '14.260')] [2023-09-22 13:44:52,046][06493] Updated weights for policy 1, policy_version 25600 (0.0014) [2023-09-22 13:44:52,047][06567] Updated weights for policy 0, policy_version 26160 (0.0014) [2023-09-22 13:44:56,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13275136. Throughput: 0: 780.8, 1: 783.2. Samples: 3280903. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 13:44:56,155][05066] Avg episode reward: [(0, '14.560'), (1, '14.620')] [2023-09-22 13:45:01,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6212.3, 300 sec: 6248.1). Total num frames: 13299712. Throughput: 0: 786.5, 1: 786.8. Samples: 3290596. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:01,155][05066] Avg episode reward: [(0, '14.090'), (1, '15.440')] [2023-09-22 13:45:05,127][06567] Updated weights for policy 0, policy_version 26320 (0.0016) [2023-09-22 13:45:05,127][06493] Updated weights for policy 1, policy_version 25760 (0.0019) [2023-09-22 13:45:06,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13332480. Throughput: 0: 783.6, 1: 784.5. Samples: 3295232. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:06,154][05066] Avg episode reward: [(0, '14.700'), (1, '16.100')] [2023-09-22 13:45:11,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13365248. Throughput: 0: 788.8, 1: 788.0. Samples: 3304747. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:11,155][05066] Avg episode reward: [(0, '14.280'), (1, '15.600')] [2023-09-22 13:45:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13398016. Throughput: 0: 785.4, 1: 784.9. Samples: 3313950. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:16,154][05066] Avg episode reward: [(0, '14.310'), (1, '15.240')] [2023-09-22 13:45:18,254][06493] Updated weights for policy 1, policy_version 25920 (0.0015) [2023-09-22 13:45:18,255][06567] Updated weights for policy 0, policy_version 26480 (0.0017) [2023-09-22 13:45:21,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13430784. Throughput: 0: 784.5, 1: 784.2. Samples: 3318633. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:21,154][05066] Avg episode reward: [(0, '15.000'), (1, '16.140')] [2023-09-22 13:45:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 13463552. Throughput: 0: 783.4, 1: 784.1. Samples: 3328003. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:26,155][05066] Avg episode reward: [(0, '13.450'), (1, '15.140')] [2023-09-22 13:45:31,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 13488128. Throughput: 0: 786.1, 1: 786.9. Samples: 3337644. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:31,155][05066] Avg episode reward: [(0, '13.250'), (1, '15.470')] [2023-09-22 13:45:31,299][06567] Updated weights for policy 0, policy_version 26640 (0.0018) [2023-09-22 13:45:31,300][06493] Updated weights for policy 1, policy_version 26080 (0.0015) [2023-09-22 13:45:36,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 13520896. Throughput: 0: 782.2, 1: 783.5. Samples: 3342322. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:36,154][05066] Avg episode reward: [(0, '12.750'), (1, '15.500')] [2023-09-22 13:45:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13553664. Throughput: 0: 784.1, 1: 783.1. Samples: 3351427. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:41,154][05066] Avg episode reward: [(0, '12.930'), (1, '15.460')] [2023-09-22 13:45:41,163][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000026192_6705152.pth... [2023-09-22 13:45:41,163][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000026752_6848512.pth... [2023-09-22 13:45:41,198][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000023264_5955584.pth [2023-09-22 13:45:41,204][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000023824_6098944.pth [2023-09-22 13:45:44,518][06567] Updated weights for policy 0, policy_version 26800 (0.0018) [2023-09-22 13:45:44,520][06493] Updated weights for policy 1, policy_version 26240 (0.0018) [2023-09-22 13:45:46,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13586432. Throughput: 0: 779.4, 1: 780.0. Samples: 3360768. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:46,155][05066] Avg episode reward: [(0, '12.960'), (1, '16.040')] [2023-09-22 13:45:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13619200. Throughput: 0: 779.6, 1: 778.8. Samples: 3365359. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:45:51,154][05066] Avg episode reward: [(0, '11.770'), (1, '14.970')] [2023-09-22 13:45:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 13643776. Throughput: 0: 779.8, 1: 779.8. Samples: 3374930. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:45:56,155][05066] Avg episode reward: [(0, '13.190'), (1, '14.320')] [2023-09-22 13:45:57,654][06493] Updated weights for policy 1, policy_version 26400 (0.0016) [2023-09-22 13:45:57,654][06567] Updated weights for policy 0, policy_version 26960 (0.0013) [2023-09-22 13:46:01,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 13676544. Throughput: 0: 779.0, 1: 779.5. Samples: 3384086. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:46:01,154][05066] Avg episode reward: [(0, '13.580'), (1, '14.770')] [2023-09-22 13:46:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13709312. Throughput: 0: 780.2, 1: 780.4. Samples: 3388859. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:46:06,154][05066] Avg episode reward: [(0, '13.270'), (1, '15.140')] [2023-09-22 13:46:10,779][06567] Updated weights for policy 0, policy_version 27120 (0.0018) [2023-09-22 13:46:10,779][06493] Updated weights for policy 1, policy_version 26560 (0.0016) [2023-09-22 13:46:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13742080. Throughput: 0: 777.9, 1: 777.2. Samples: 3397983. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:46:11,155][05066] Avg episode reward: [(0, '13.680'), (1, '14.850')] [2023-09-22 13:46:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13774848. Throughput: 0: 780.4, 1: 780.1. Samples: 3407865. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:16,154][05066] Avg episode reward: [(0, '14.150'), (1, '14.560')] [2023-09-22 13:46:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 13799424. Throughput: 0: 776.8, 1: 776.3. Samples: 3412209. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:21,155][05066] Avg episode reward: [(0, '14.700'), (1, '15.130')] [2023-09-22 13:46:23,924][06567] Updated weights for policy 0, policy_version 27280 (0.0015) [2023-09-22 13:46:23,924][06493] Updated weights for policy 1, policy_version 26720 (0.0015) [2023-09-22 13:46:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 13832192. Throughput: 0: 780.6, 1: 781.2. Samples: 3421710. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:26,154][05066] Avg episode reward: [(0, '13.940'), (1, '16.050')] [2023-09-22 13:46:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 13864960. Throughput: 0: 780.7, 1: 779.9. Samples: 3430995. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:31,155][05066] Avg episode reward: [(0, '13.590'), (1, '15.410')] [2023-09-22 13:46:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13897728. Throughput: 0: 783.3, 1: 784.2. Samples: 3435899. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:36,154][05066] Avg episode reward: [(0, '14.210'), (1, '15.960')] [2023-09-22 13:46:36,919][06493] Updated weights for policy 1, policy_version 26880 (0.0014) [2023-09-22 13:46:36,920][06567] Updated weights for policy 0, policy_version 27440 (0.0016) [2023-09-22 13:46:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 13930496. Throughput: 0: 780.8, 1: 780.2. Samples: 3445175. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:41,155][05066] Avg episode reward: [(0, '14.650'), (1, '16.440')] [2023-09-22 13:46:46,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 13959168. Throughput: 0: 784.8, 1: 783.2. Samples: 3454648. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:46,154][05066] Avg episode reward: [(0, '14.930'), (1, '16.570')] [2023-09-22 13:46:50,099][06567] Updated weights for policy 0, policy_version 27600 (0.0016) [2023-09-22 13:46:50,099][06493] Updated weights for policy 1, policy_version 27040 (0.0015) [2023-09-22 13:46:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 13987840. Throughput: 0: 779.9, 1: 780.4. Samples: 3459072. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:51,155][05066] Avg episode reward: [(0, '15.360'), (1, '15.790')] [2023-09-22 13:46:56,154][05066] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14020608. Throughput: 0: 786.9, 1: 787.2. Samples: 3468817. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:46:56,155][05066] Avg episode reward: [(0, '15.260'), (1, '15.000')] [2023-09-22 13:47:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14053376. Throughput: 0: 782.2, 1: 781.5. Samples: 3478232. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:01,155][05066] Avg episode reward: [(0, '14.970'), (1, '14.810')] [2023-09-22 13:47:02,975][06567] Updated weights for policy 0, policy_version 27760 (0.0020) [2023-09-22 13:47:02,975][06493] Updated weights for policy 1, policy_version 27200 (0.0018) [2023-09-22 13:47:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14086144. Throughput: 0: 787.3, 1: 786.3. Samples: 3483021. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:06,155][05066] Avg episode reward: [(0, '15.810'), (1, '15.130')] [2023-09-22 13:47:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6262.0). Total num frames: 14118912. Throughput: 0: 781.0, 1: 780.7. Samples: 3491986. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:11,155][05066] Avg episode reward: [(0, '15.790'), (1, '13.860')] [2023-09-22 13:47:16,142][06567] Updated weights for policy 0, policy_version 27920 (0.0017) [2023-09-22 13:47:16,142][06493] Updated weights for policy 1, policy_version 27360 (0.0018) [2023-09-22 13:47:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 14151680. Throughput: 0: 784.9, 1: 785.3. Samples: 3501656. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:16,155][05066] Avg episode reward: [(0, '15.540'), (1, '12.360')] [2023-09-22 13:47:21,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 14176256. Throughput: 0: 781.0, 1: 780.9. Samples: 3506185. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:21,154][05066] Avg episode reward: [(0, '15.220'), (1, '12.780')] [2023-09-22 13:47:26,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14209024. Throughput: 0: 782.4, 1: 782.9. Samples: 3515615. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:26,154][05066] Avg episode reward: [(0, '15.050'), (1, '12.650')] [2023-09-22 13:47:29,178][06567] Updated weights for policy 0, policy_version 28080 (0.0017) [2023-09-22 13:47:29,178][06493] Updated weights for policy 1, policy_version 27520 (0.0015) [2023-09-22 13:47:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14241792. Throughput: 0: 781.9, 1: 783.1. Samples: 3525074. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:31,155][05066] Avg episode reward: [(0, '14.680'), (1, '12.010')] [2023-09-22 13:47:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14274560. Throughput: 0: 788.9, 1: 787.8. Samples: 3530025. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:36,155][05066] Avg episode reward: [(0, '13.970'), (1, '12.620')] [2023-09-22 13:47:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 14307328. Throughput: 0: 782.9, 1: 782.5. Samples: 3539260. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:41,154][05066] Avg episode reward: [(0, '13.940'), (1, '13.280')] [2023-09-22 13:47:41,165][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000028224_7225344.pth... [2023-09-22 13:47:41,165][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000027664_7081984.pth... [2023-09-22 13:47:41,197][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000025296_6475776.pth [2023-09-22 13:47:41,198][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000024736_6332416.pth [2023-09-22 13:47:42,139][06493] Updated weights for policy 1, policy_version 27680 (0.0016) [2023-09-22 13:47:42,140][06567] Updated weights for policy 0, policy_version 28240 (0.0016) [2023-09-22 13:47:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6275.9). Total num frames: 14340096. Throughput: 0: 787.4, 1: 787.0. Samples: 3549081. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:46,155][05066] Avg episode reward: [(0, '13.410'), (1, '13.800')] [2023-09-22 13:47:51,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14364672. Throughput: 0: 781.4, 1: 781.8. Samples: 3553367. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:51,155][05066] Avg episode reward: [(0, '12.860'), (1, '14.710')] [2023-09-22 13:47:55,190][06567] Updated weights for policy 0, policy_version 28400 (0.0012) [2023-09-22 13:47:55,190][06493] Updated weights for policy 1, policy_version 27840 (0.0015) [2023-09-22 13:47:56,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14397440. Throughput: 0: 789.1, 1: 789.1. Samples: 3563007. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:47:56,154][05066] Avg episode reward: [(0, '12.500'), (1, '14.290')] [2023-09-22 13:48:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14430208. Throughput: 0: 785.5, 1: 785.0. Samples: 3572327. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:48:01,155][05066] Avg episode reward: [(0, '12.730'), (1, '15.140')] [2023-09-22 13:48:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 14462976. Throughput: 0: 788.9, 1: 788.0. Samples: 3577147. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:48:06,154][05066] Avg episode reward: [(0, '13.610'), (1, '14.880')] [2023-09-22 13:48:08,240][06493] Updated weights for policy 1, policy_version 28000 (0.0017) [2023-09-22 13:48:08,240][06567] Updated weights for policy 0, policy_version 28560 (0.0017) [2023-09-22 13:48:11,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 14495744. Throughput: 0: 785.4, 1: 785.2. Samples: 3586295. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:48:11,154][05066] Avg episode reward: [(0, '14.060'), (1, '15.610')] [2023-09-22 13:48:16,154][05066] Fps is (10 sec: 6143.9, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 14524416. Throughput: 0: 786.2, 1: 785.9. Samples: 3595821. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:48:16,155][05066] Avg episode reward: [(0, '14.160'), (1, '14.320')] [2023-09-22 13:48:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14553088. Throughput: 0: 782.0, 1: 782.6. Samples: 3600435. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:48:21,154][05066] Avg episode reward: [(0, '14.280'), (1, '14.720')] [2023-09-22 13:48:21,379][06493] Updated weights for policy 1, policy_version 28160 (0.0012) [2023-09-22 13:48:21,379][06567] Updated weights for policy 0, policy_version 28720 (0.0016) [2023-09-22 13:48:26,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14585856. Throughput: 0: 783.4, 1: 784.1. Samples: 3609800. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:48:26,154][05066] Avg episode reward: [(0, '15.670'), (1, '14.290')] [2023-09-22 13:48:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14618624. Throughput: 0: 776.8, 1: 777.4. Samples: 3619019. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:48:31,155][05066] Avg episode reward: [(0, '14.970'), (1, '15.650')] [2023-09-22 13:48:34,505][06567] Updated weights for policy 0, policy_version 28880 (0.0017) [2023-09-22 13:48:34,505][06493] Updated weights for policy 1, policy_version 28320 (0.0016) [2023-09-22 13:48:36,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 14651392. Throughput: 0: 783.0, 1: 783.2. Samples: 3623845. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:48:36,155][05066] Avg episode reward: [(0, '15.540'), (1, '15.250')] [2023-09-22 13:48:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 14684160. Throughput: 0: 779.1, 1: 779.7. Samples: 3633152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:48:41,155][05066] Avg episode reward: [(0, '15.280'), (1, '15.080')] [2023-09-22 13:48:46,154][05066] Fps is (10 sec: 5734.6, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 14708736. Throughput: 0: 784.4, 1: 784.3. Samples: 3642919. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:48:46,154][05066] Avg episode reward: [(0, '14.710'), (1, '15.280')] [2023-09-22 13:48:47,432][06567] Updated weights for policy 0, policy_version 29040 (0.0017) [2023-09-22 13:48:47,434][06493] Updated weights for policy 1, policy_version 28480 (0.0017) [2023-09-22 13:48:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 14741504. Throughput: 0: 781.3, 1: 782.1. Samples: 3647500. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:48:51,154][05066] Avg episode reward: [(0, '15.050'), (1, '15.180')] [2023-09-22 13:48:56,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6262.0). Total num frames: 14774272. Throughput: 0: 787.3, 1: 786.9. Samples: 3657131. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:48:56,155][05066] Avg episode reward: [(0, '15.000'), (1, '15.450')] [2023-09-22 13:49:00,445][06493] Updated weights for policy 1, policy_version 28640 (0.0014) [2023-09-22 13:49:00,445][06567] Updated weights for policy 0, policy_version 29200 (0.0018) [2023-09-22 13:49:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 14807040. Throughput: 0: 784.3, 1: 784.9. Samples: 3666438. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:49:01,154][05066] Avg episode reward: [(0, '15.060'), (1, '14.160')] [2023-09-22 13:49:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 14839808. Throughput: 0: 789.4, 1: 789.6. Samples: 3671488. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:49:06,155][05066] Avg episode reward: [(0, '15.810'), (1, '14.520')] [2023-09-22 13:49:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 14872576. Throughput: 0: 788.9, 1: 788.5. Samples: 3680782. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:49:11,154][05066] Avg episode reward: [(0, '14.920'), (1, '14.180')] [2023-09-22 13:49:13,294][06567] Updated weights for policy 0, policy_version 29360 (0.0016) [2023-09-22 13:49:13,294][06493] Updated weights for policy 1, policy_version 28800 (0.0017) [2023-09-22 13:49:16,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6348.8, 300 sec: 6275.9). Total num frames: 14905344. Throughput: 0: 793.8, 1: 794.6. Samples: 3690496. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:49:16,154][05066] Avg episode reward: [(0, '15.370'), (1, '14.450')] [2023-09-22 13:49:21,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 14938112. Throughput: 0: 791.9, 1: 791.8. Samples: 3695111. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:49:21,154][05066] Avg episode reward: [(0, '15.420'), (1, '15.230')] [2023-09-22 13:49:26,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 14962688. Throughput: 0: 795.0, 1: 794.3. Samples: 3704668. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:49:26,154][05066] Avg episode reward: [(0, '15.360'), (1, '15.010')] [2023-09-22 13:49:26,336][06567] Updated weights for policy 0, policy_version 29520 (0.0018) [2023-09-22 13:49:26,336][06493] Updated weights for policy 1, policy_version 28960 (0.0016) [2023-09-22 13:49:31,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 14995456. Throughput: 0: 791.0, 1: 791.4. Samples: 3714129. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:49:31,155][05066] Avg episode reward: [(0, '15.830'), (1, '15.740')] [2023-09-22 13:49:36,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 15028224. Throughput: 0: 794.4, 1: 794.1. Samples: 3718983. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:49:36,155][05066] Avg episode reward: [(0, '15.970'), (1, '15.560')] [2023-09-22 13:49:39,302][06567] Updated weights for policy 0, policy_version 29680 (0.0013) [2023-09-22 13:49:39,302][06493] Updated weights for policy 1, policy_version 29120 (0.0016) [2023-09-22 13:49:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15060992. Throughput: 0: 788.5, 1: 789.3. Samples: 3728132. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:49:41,154][05066] Avg episode reward: [(0, '16.490'), (1, '15.920')] [2023-09-22 13:49:41,165][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000029136_7458816.pth... [2023-09-22 13:49:41,165][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000029696_7602176.pth... [2023-09-22 13:49:41,202][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000026192_6705152.pth [2023-09-22 13:49:41,204][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000026752_6848512.pth [2023-09-22 13:49:41,208][06078] Saving new best policy, reward=16.490! [2023-09-22 13:49:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6275.9). Total num frames: 15093760. Throughput: 0: 790.4, 1: 791.0. Samples: 3737600. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:49:46,155][05066] Avg episode reward: [(0, '16.690'), (1, '15.940')] [2023-09-22 13:49:46,156][06078] Saving new best policy, reward=16.690! [2023-09-22 13:49:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 15126528. Throughput: 0: 787.0, 1: 787.2. Samples: 3742326. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:49:51,155][05066] Avg episode reward: [(0, '16.660'), (1, '15.850')] [2023-09-22 13:49:52,337][06493] Updated weights for policy 1, policy_version 29280 (0.0016) [2023-09-22 13:49:52,337][06567] Updated weights for policy 0, policy_version 29840 (0.0018) [2023-09-22 13:49:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15151104. Throughput: 0: 790.3, 1: 790.7. Samples: 3751926. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:49:56,155][05066] Avg episode reward: [(0, '16.700'), (1, '15.660')] [2023-09-22 13:49:56,211][06078] Saving new best policy, reward=16.700! [2023-09-22 13:50:01,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15183872. Throughput: 0: 788.5, 1: 787.6. Samples: 3761422. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:50:01,154][05066] Avg episode reward: [(0, '16.600'), (1, '16.110')] [2023-09-22 13:50:05,155][06493] Updated weights for policy 1, policy_version 29440 (0.0015) [2023-09-22 13:50:05,155][06567] Updated weights for policy 0, policy_version 30000 (0.0015) [2023-09-22 13:50:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 15216640. Throughput: 0: 790.3, 1: 791.0. Samples: 3766272. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:50:06,154][05066] Avg episode reward: [(0, '16.750'), (1, '14.780')] [2023-09-22 13:50:06,155][06078] Saving new best policy, reward=16.750! [2023-09-22 13:50:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15249408. Throughput: 0: 789.7, 1: 789.9. Samples: 3775752. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:50:11,155][05066] Avg episode reward: [(0, '15.820'), (1, '15.260')] [2023-09-22 13:50:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15282176. Throughput: 0: 789.0, 1: 788.8. Samples: 3785131. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:50:16,154][05066] Avg episode reward: [(0, '15.240'), (1, '15.070')] [2023-09-22 13:50:18,094][06493] Updated weights for policy 1, policy_version 29600 (0.0018) [2023-09-22 13:50:18,095][06567] Updated weights for policy 0, policy_version 30160 (0.0018) [2023-09-22 13:50:21,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15314944. Throughput: 0: 789.1, 1: 789.0. Samples: 3789995. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:50:21,154][05066] Avg episode reward: [(0, '15.180'), (1, '15.070')] [2023-09-22 13:50:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6303.7). Total num frames: 15347712. Throughput: 0: 787.7, 1: 788.2. Samples: 3799048. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 13:50:26,155][05066] Avg episode reward: [(0, '15.870'), (1, '14.240')] [2023-09-22 13:50:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15372288. Throughput: 0: 791.0, 1: 790.1. Samples: 3808750. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:50:31,154][05066] Avg episode reward: [(0, '15.700'), (1, '14.020')] [2023-09-22 13:50:31,233][06567] Updated weights for policy 0, policy_version 30320 (0.0016) [2023-09-22 13:50:31,234][06493] Updated weights for policy 1, policy_version 29760 (0.0016) [2023-09-22 13:50:36,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15405056. Throughput: 0: 789.3, 1: 789.6. Samples: 3813376. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:50:36,154][05066] Avg episode reward: [(0, '15.280'), (1, '14.410')] [2023-09-22 13:50:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 15437824. Throughput: 0: 788.1, 1: 787.4. Samples: 3822825. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:50:41,154][05066] Avg episode reward: [(0, '14.940'), (1, '14.810')] [2023-09-22 13:50:44,239][06493] Updated weights for policy 1, policy_version 29920 (0.0017) [2023-09-22 13:50:44,239][06567] Updated weights for policy 0, policy_version 30480 (0.0017) [2023-09-22 13:50:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 15470592. Throughput: 0: 786.0, 1: 786.2. Samples: 3832172. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:50:46,154][05066] Avg episode reward: [(0, '14.650'), (1, '15.400')] [2023-09-22 13:50:51,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15503360. Throughput: 0: 787.1, 1: 786.4. Samples: 3837080. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:50:51,155][05066] Avg episode reward: [(0, '14.030'), (1, '14.870')] [2023-09-22 13:50:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 15536128. Throughput: 0: 786.6, 1: 786.4. Samples: 3846535. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:50:56,154][05066] Avg episode reward: [(0, '13.960'), (1, '14.590')] [2023-09-22 13:50:57,127][06567] Updated weights for policy 0, policy_version 30640 (0.0015) [2023-09-22 13:50:57,128][06493] Updated weights for policy 1, policy_version 30080 (0.0016) [2023-09-22 13:51:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 15568896. Throughput: 0: 791.2, 1: 791.7. Samples: 3856364. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:51:01,154][05066] Avg episode reward: [(0, '14.210'), (1, '14.170')] [2023-09-22 13:51:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15593472. Throughput: 0: 785.3, 1: 785.3. Samples: 3860673. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 13:51:06,155][05066] Avg episode reward: [(0, '14.620'), (1, '14.390')] [2023-09-22 13:51:10,307][06493] Updated weights for policy 1, policy_version 30240 (0.0017) [2023-09-22 13:51:10,307][06567] Updated weights for policy 0, policy_version 30800 (0.0017) [2023-09-22 13:51:11,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 15626240. Throughput: 0: 788.9, 1: 788.5. Samples: 3870034. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:51:11,154][05066] Avg episode reward: [(0, '15.010'), (1, '13.460')] [2023-09-22 13:51:16,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15659008. Throughput: 0: 783.8, 1: 783.9. Samples: 3879299. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:51:16,154][05066] Avg episode reward: [(0, '15.790'), (1, '14.800')] [2023-09-22 13:51:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15691776. Throughput: 0: 787.9, 1: 787.4. Samples: 3884263. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:51:21,154][05066] Avg episode reward: [(0, '14.920'), (1, '15.120')] [2023-09-22 13:51:23,196][06567] Updated weights for policy 0, policy_version 30960 (0.0015) [2023-09-22 13:51:23,196][06493] Updated weights for policy 1, policy_version 30400 (0.0015) [2023-09-22 13:51:26,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15724544. Throughput: 0: 786.3, 1: 786.4. Samples: 3893599. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:51:26,155][05066] Avg episode reward: [(0, '15.150'), (1, '14.170')] [2023-09-22 13:51:31,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15749120. Throughput: 0: 785.1, 1: 785.6. Samples: 3902852. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:51:31,155][05066] Avg episode reward: [(0, '15.420'), (1, '14.320')] [2023-09-22 13:51:36,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15781888. Throughput: 0: 783.0, 1: 783.7. Samples: 3907584. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:51:36,155][05066] Avg episode reward: [(0, '14.340'), (1, '14.150')] [2023-09-22 13:51:36,507][06493] Updated weights for policy 1, policy_version 30560 (0.0014) [2023-09-22 13:51:36,507][06567] Updated weights for policy 0, policy_version 31120 (0.0014) [2023-09-22 13:51:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6289.8). Total num frames: 15814656. Throughput: 0: 781.8, 1: 782.0. Samples: 3916906. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:51:41,154][05066] Avg episode reward: [(0, '15.170'), (1, '14.950')] [2023-09-22 13:51:41,165][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000031168_7979008.pth... [2023-09-22 13:51:41,165][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000030608_7835648.pth... [2023-09-22 13:51:41,196][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000027664_7081984.pth [2023-09-22 13:51:41,199][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000028224_7225344.pth [2023-09-22 13:51:46,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15847424. Throughput: 0: 779.9, 1: 780.0. Samples: 3926558. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:51:46,154][05066] Avg episode reward: [(0, '15.010'), (1, '14.790')] [2023-09-22 13:51:49,366][06493] Updated weights for policy 1, policy_version 30720 (0.0014) [2023-09-22 13:51:49,367][06567] Updated weights for policy 0, policy_version 31280 (0.0015) [2023-09-22 13:51:51,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15880192. Throughput: 0: 786.4, 1: 787.0. Samples: 3931477. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:51:51,155][05066] Avg episode reward: [(0, '14.740'), (1, '15.150')] [2023-09-22 13:51:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15912960. Throughput: 0: 785.1, 1: 784.8. Samples: 3940682. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:51:56,154][05066] Avg episode reward: [(0, '14.890'), (1, '14.450')] [2023-09-22 13:52:01,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 15945728. Throughput: 0: 788.8, 1: 790.8. Samples: 3950382. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:52:01,154][05066] Avg episode reward: [(0, '16.290'), (1, '15.630')] [2023-09-22 13:52:02,406][06493] Updated weights for policy 1, policy_version 30880 (0.0017) [2023-09-22 13:52:02,406][06567] Updated weights for policy 0, policy_version 31440 (0.0016) [2023-09-22 13:52:06,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 15970304. Throughput: 0: 782.7, 1: 782.8. Samples: 3954712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:52:06,155][05066] Avg episode reward: [(0, '15.460'), (1, '15.640')] [2023-09-22 13:52:11,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 16003072. Throughput: 0: 787.9, 1: 787.0. Samples: 3964470. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 13:52:11,155][05066] Avg episode reward: [(0, '15.650'), (1, '16.060')] [2023-09-22 13:52:15,265][06493] Updated weights for policy 1, policy_version 31040 (0.0018) [2023-09-22 13:52:15,265][06567] Updated weights for policy 0, policy_version 31600 (0.0019) [2023-09-22 13:52:16,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16035840. Throughput: 0: 790.5, 1: 789.7. Samples: 3973961. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:16,155][05066] Avg episode reward: [(0, '15.910'), (1, '15.770')] [2023-09-22 13:52:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16068608. Throughput: 0: 793.2, 1: 792.0. Samples: 3978920. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:21,155][05066] Avg episode reward: [(0, '15.950'), (1, '16.040')] [2023-09-22 13:52:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16101376. Throughput: 0: 791.3, 1: 790.6. Samples: 3988092. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:26,155][05066] Avg episode reward: [(0, '16.450'), (1, '16.500')] [2023-09-22 13:52:28,228][06493] Updated weights for policy 1, policy_version 31200 (0.0018) [2023-09-22 13:52:28,229][06567] Updated weights for policy 0, policy_version 31760 (0.0020) [2023-09-22 13:52:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 16134144. Throughput: 0: 790.2, 1: 790.6. Samples: 3997696. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:31,154][05066] Avg episode reward: [(0, '15.100'), (1, '16.180')] [2023-09-22 13:52:36,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 16166912. Throughput: 0: 788.9, 1: 788.1. Samples: 4002443. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:36,154][05066] Avg episode reward: [(0, '14.990'), (1, '17.230')] [2023-09-22 13:52:36,155][06278] Saving new best policy, reward=17.230! [2023-09-22 13:52:41,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 16191488. Throughput: 0: 790.9, 1: 790.6. Samples: 4011853. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:41,154][05066] Avg episode reward: [(0, '15.000'), (1, '15.650')] [2023-09-22 13:52:41,334][06493] Updated weights for policy 1, policy_version 31360 (0.0015) [2023-09-22 13:52:41,334][06567] Updated weights for policy 0, policy_version 31920 (0.0015) [2023-09-22 13:52:46,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16224256. Throughput: 0: 787.0, 1: 785.2. Samples: 4021129. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:46,154][05066] Avg episode reward: [(0, '14.710'), (1, '15.060')] [2023-09-22 13:52:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 16257024. Throughput: 0: 792.5, 1: 791.8. Samples: 4026006. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:52:51,154][05066] Avg episode reward: [(0, '14.540'), (1, '15.320')] [2023-09-22 13:52:54,299][06567] Updated weights for policy 0, policy_version 32080 (0.0016) [2023-09-22 13:52:54,299][06493] Updated weights for policy 1, policy_version 31520 (0.0017) [2023-09-22 13:52:56,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16289792. Throughput: 0: 786.8, 1: 788.0. Samples: 4035336. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:52:56,154][05066] Avg episode reward: [(0, '14.270'), (1, '14.660')] [2023-09-22 13:53:01,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16322560. Throughput: 0: 786.6, 1: 787.6. Samples: 4044800. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:53:01,155][05066] Avg episode reward: [(0, '13.680'), (1, '14.200')] [2023-09-22 13:53:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 16355328. Throughput: 0: 781.5, 1: 783.0. Samples: 4049324. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:53:06,155][05066] Avg episode reward: [(0, '14.210'), (1, '14.340')] [2023-09-22 13:53:07,423][06567] Updated weights for policy 0, policy_version 32240 (0.0014) [2023-09-22 13:53:07,424][06493] Updated weights for policy 1, policy_version 31680 (0.0015) [2023-09-22 13:53:11,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6289.8). Total num frames: 16379904. Throughput: 0: 788.0, 1: 787.0. Samples: 4058966. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 13:53:11,154][05066] Avg episode reward: [(0, '14.710'), (1, '14.680')] [2023-09-22 13:53:16,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 16412672. Throughput: 0: 780.9, 1: 780.6. Samples: 4067963. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:53:16,154][05066] Avg episode reward: [(0, '15.100'), (1, '14.830')] [2023-09-22 13:53:20,504][06567] Updated weights for policy 0, policy_version 32400 (0.0019) [2023-09-22 13:53:20,504][06493] Updated weights for policy 1, policy_version 31840 (0.0020) [2023-09-22 13:53:21,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16445440. Throughput: 0: 783.6, 1: 783.5. Samples: 4072962. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:53:21,155][05066] Avg episode reward: [(0, '15.100'), (1, '14.520')] [2023-09-22 13:53:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16478208. Throughput: 0: 779.6, 1: 780.1. Samples: 4082039. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:53:26,155][05066] Avg episode reward: [(0, '15.380'), (1, '15.290')] [2023-09-22 13:53:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16502784. Throughput: 0: 780.8, 1: 780.6. Samples: 4091390. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:53:31,155][05066] Avg episode reward: [(0, '15.650'), (1, '15.300')] [2023-09-22 13:53:33,897][06567] Updated weights for policy 0, policy_version 32560 (0.0016) [2023-09-22 13:53:33,897][06493] Updated weights for policy 1, policy_version 32000 (0.0017) [2023-09-22 13:53:36,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16535552. Throughput: 0: 777.2, 1: 778.2. Samples: 4096000. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:53:36,155][05066] Avg episode reward: [(0, '15.840'), (1, '15.180')] [2023-09-22 13:53:41,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16568320. Throughput: 0: 771.0, 1: 770.8. Samples: 4104720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:53:41,154][05066] Avg episode reward: [(0, '14.800'), (1, '15.020')] [2023-09-22 13:53:41,163][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000032640_8355840.pth... [2023-09-22 13:53:41,163][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000032080_8212480.pth... [2023-09-22 13:53:41,199][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000029136_7458816.pth [2023-09-22 13:53:41,203][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000029696_7602176.pth [2023-09-22 13:53:46,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16601088. Throughput: 0: 770.2, 1: 770.0. Samples: 4114111. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:53:46,155][05066] Avg episode reward: [(0, '14.600'), (1, '14.650')] [2023-09-22 13:53:47,450][06493] Updated weights for policy 1, policy_version 32160 (0.0014) [2023-09-22 13:53:47,450][06567] Updated weights for policy 0, policy_version 32720 (0.0016) [2023-09-22 13:53:51,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16625664. Throughput: 0: 769.2, 1: 768.9. Samples: 4118537. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:53:51,155][05066] Avg episode reward: [(0, '14.150'), (1, '14.770')] [2023-09-22 13:53:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16658432. Throughput: 0: 766.3, 1: 768.1. Samples: 4128014. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 13:53:56,155][05066] Avg episode reward: [(0, '14.700'), (1, '14.740')] [2023-09-22 13:54:00,667][06493] Updated weights for policy 1, policy_version 32320 (0.0018) [2023-09-22 13:54:00,667][06567] Updated weights for policy 0, policy_version 32880 (0.0018) [2023-09-22 13:54:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16691200. Throughput: 0: 767.7, 1: 767.2. Samples: 4137033. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:01,154][05066] Avg episode reward: [(0, '14.530'), (1, '14.090')] [2023-09-22 13:54:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16723968. Throughput: 0: 766.0, 1: 766.2. Samples: 4141912. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:06,155][05066] Avg episode reward: [(0, '14.450'), (1, '13.300')] [2023-09-22 13:54:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 16756736. Throughput: 0: 769.3, 1: 769.8. Samples: 4151296. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:11,155][05066] Avg episode reward: [(0, '14.480'), (1, '12.620')] [2023-09-22 13:54:13,689][06493] Updated weights for policy 1, policy_version 32480 (0.0016) [2023-09-22 13:54:13,689][06567] Updated weights for policy 0, policy_version 33040 (0.0017) [2023-09-22 13:54:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 16781312. Throughput: 0: 773.3, 1: 773.0. Samples: 4160975. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:16,155][05066] Avg episode reward: [(0, '15.200'), (1, '12.480')] [2023-09-22 13:54:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16814080. Throughput: 0: 773.7, 1: 773.7. Samples: 4165633. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:21,155][05066] Avg episode reward: [(0, '15.370'), (1, '12.440')] [2023-09-22 13:54:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 16846848. Throughput: 0: 784.5, 1: 784.5. Samples: 4175326. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:26,155][05066] Avg episode reward: [(0, '15.470'), (1, '12.160')] [2023-09-22 13:54:26,518][06493] Updated weights for policy 1, policy_version 32640 (0.0015) [2023-09-22 13:54:26,518][06567] Updated weights for policy 0, policy_version 33200 (0.0016) [2023-09-22 13:54:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 16879616. Throughput: 0: 782.9, 1: 782.1. Samples: 4184538. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:31,155][05066] Avg episode reward: [(0, '13.860'), (1, '12.180')] [2023-09-22 13:54:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 16912384. Throughput: 0: 790.2, 1: 789.0. Samples: 4189599. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:54:36,155][05066] Avg episode reward: [(0, '15.000'), (1, '12.610')] [2023-09-22 13:54:39,531][06493] Updated weights for policy 1, policy_version 32800 (0.0015) [2023-09-22 13:54:39,531][06567] Updated weights for policy 0, policy_version 33360 (0.0016) [2023-09-22 13:54:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 16945152. Throughput: 0: 786.1, 1: 785.7. Samples: 4198745. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:54:41,155][05066] Avg episode reward: [(0, '15.080'), (1, '12.930')] [2023-09-22 13:54:46,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 16977920. Throughput: 0: 791.7, 1: 793.0. Samples: 4208344. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:54:46,155][05066] Avg episode reward: [(0, '14.610'), (1, '13.370')] [2023-09-22 13:54:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17002496. Throughput: 0: 786.6, 1: 787.3. Samples: 4212737. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:54:51,155][05066] Avg episode reward: [(0, '15.560'), (1, '14.240')] [2023-09-22 13:54:52,859][06567] Updated weights for policy 0, policy_version 33520 (0.0015) [2023-09-22 13:54:52,859][06493] Updated weights for policy 1, policy_version 32960 (0.0016) [2023-09-22 13:54:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17035264. Throughput: 0: 786.0, 1: 785.2. Samples: 4222000. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:54:56,154][05066] Avg episode reward: [(0, '15.310'), (1, '13.130')] [2023-09-22 13:55:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17068032. Throughput: 0: 782.0, 1: 782.4. Samples: 4231372. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:55:01,154][05066] Avg episode reward: [(0, '15.200'), (1, '13.210')] [2023-09-22 13:55:05,715][06567] Updated weights for policy 0, policy_version 33680 (0.0015) [2023-09-22 13:55:05,716][06493] Updated weights for policy 1, policy_version 33120 (0.0016) [2023-09-22 13:55:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17100800. Throughput: 0: 784.9, 1: 784.4. Samples: 4236248. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:55:06,154][05066] Avg episode reward: [(0, '16.090'), (1, '13.970')] [2023-09-22 13:55:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 17133568. Throughput: 0: 780.6, 1: 780.4. Samples: 4245573. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:55:11,154][05066] Avg episode reward: [(0, '15.940'), (1, '14.860')] [2023-09-22 13:55:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 17166336. Throughput: 0: 787.6, 1: 786.9. Samples: 4255387. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:55:16,154][05066] Avg episode reward: [(0, '15.630'), (1, '14.700')] [2023-09-22 13:55:18,685][06567] Updated weights for policy 0, policy_version 33840 (0.0015) [2023-09-22 13:55:18,686][06493] Updated weights for policy 1, policy_version 33280 (0.0017) [2023-09-22 13:55:21,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 17190912. Throughput: 0: 780.6, 1: 781.2. Samples: 4259878. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:55:21,154][05066] Avg episode reward: [(0, '16.540'), (1, '14.950')] [2023-09-22 13:55:26,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17223680. Throughput: 0: 786.1, 1: 785.2. Samples: 4269455. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:55:26,155][05066] Avg episode reward: [(0, '16.370'), (1, '14.090')] [2023-09-22 13:55:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17256448. Throughput: 0: 781.1, 1: 780.3. Samples: 4278604. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:55:31,155][05066] Avg episode reward: [(0, '15.740'), (1, '15.110')] [2023-09-22 13:55:31,883][06493] Updated weights for policy 1, policy_version 33440 (0.0016) [2023-09-22 13:55:31,884][06567] Updated weights for policy 0, policy_version 34000 (0.0017) [2023-09-22 13:55:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17289216. Throughput: 0: 785.6, 1: 784.9. Samples: 4283408. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:55:36,155][05066] Avg episode reward: [(0, '15.460'), (1, '15.980')] [2023-09-22 13:55:41,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17321984. Throughput: 0: 784.9, 1: 785.1. Samples: 4292650. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:55:41,155][05066] Avg episode reward: [(0, '15.310'), (1, '15.170')] [2023-09-22 13:55:41,167][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000033552_8589312.pth... [2023-09-22 13:55:41,167][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000034112_8732672.pth... [2023-09-22 13:55:41,202][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000030608_7835648.pth [2023-09-22 13:55:41,204][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000031168_7979008.pth [2023-09-22 13:55:44,804][06493] Updated weights for policy 1, policy_version 33600 (0.0015) [2023-09-22 13:55:44,804][06567] Updated weights for policy 0, policy_version 34160 (0.0014) [2023-09-22 13:55:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17354752. Throughput: 0: 790.6, 1: 791.1. Samples: 4302552. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 13:55:46,155][05066] Avg episode reward: [(0, '15.750'), (1, '14.940')] [2023-09-22 13:55:51,154][05066] Fps is (10 sec: 5734.7, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 17379328. Throughput: 0: 786.8, 1: 786.7. Samples: 4307054. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:55:51,154][05066] Avg episode reward: [(0, '15.900'), (1, '15.360')] [2023-09-22 13:55:56,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 17412096. Throughput: 0: 786.4, 1: 787.0. Samples: 4316378. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:55:56,154][05066] Avg episode reward: [(0, '15.170'), (1, '15.920')] [2023-09-22 13:55:58,102][06493] Updated weights for policy 1, policy_version 33760 (0.0014) [2023-09-22 13:55:58,103][06567] Updated weights for policy 0, policy_version 34320 (0.0015) [2023-09-22 13:56:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17444864. Throughput: 0: 776.9, 1: 778.5. Samples: 4325382. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:56:01,154][05066] Avg episode reward: [(0, '14.670'), (1, '15.150')] [2023-09-22 13:56:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17477632. Throughput: 0: 782.2, 1: 782.4. Samples: 4330282. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:56:06,155][05066] Avg episode reward: [(0, '15.480'), (1, '15.490')] [2023-09-22 13:56:11,146][06567] Updated weights for policy 0, policy_version 34480 (0.0017) [2023-09-22 13:56:11,146][06493] Updated weights for policy 1, policy_version 33920 (0.0016) [2023-09-22 13:56:11,154][05066] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17510400. Throughput: 0: 779.8, 1: 781.4. Samples: 4339713. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:56:11,155][05066] Avg episode reward: [(0, '15.190'), (1, '16.230')] [2023-09-22 13:56:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 17534976. Throughput: 0: 786.2, 1: 786.3. Samples: 4349366. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:56:16,155][05066] Avg episode reward: [(0, '15.400'), (1, '16.340')] [2023-09-22 13:56:21,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 17567744. Throughput: 0: 784.5, 1: 785.3. Samples: 4354048. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:56:21,155][05066] Avg episode reward: [(0, '15.140'), (1, '16.940')] [2023-09-22 13:56:24,063][06567] Updated weights for policy 0, policy_version 34640 (0.0015) [2023-09-22 13:56:24,064][06493] Updated weights for policy 1, policy_version 34080 (0.0014) [2023-09-22 13:56:26,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17600512. Throughput: 0: 787.8, 1: 787.3. Samples: 4363531. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:56:26,155][05066] Avg episode reward: [(0, '14.780'), (1, '16.520')] [2023-09-22 13:56:31,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17633280. Throughput: 0: 781.3, 1: 780.6. Samples: 4372837. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:56:31,155][05066] Avg episode reward: [(0, '15.560'), (1, '16.670')] [2023-09-22 13:56:36,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17666048. Throughput: 0: 784.1, 1: 783.9. Samples: 4377614. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:56:36,155][05066] Avg episode reward: [(0, '15.160'), (1, '16.600')] [2023-09-22 13:56:37,054][06567] Updated weights for policy 0, policy_version 34800 (0.0018) [2023-09-22 13:56:37,054][06493] Updated weights for policy 1, policy_version 34240 (0.0017) [2023-09-22 13:56:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 17698816. Throughput: 0: 783.2, 1: 782.8. Samples: 4386850. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:56:41,154][05066] Avg episode reward: [(0, '15.330'), (1, '17.860')] [2023-09-22 13:56:41,163][06278] Saving new best policy, reward=17.860! [2023-09-22 13:56:46,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 17727488. Throughput: 0: 791.3, 1: 792.4. Samples: 4396649. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:56:46,155][05066] Avg episode reward: [(0, '15.600'), (1, '17.250')] [2023-09-22 13:56:50,052][06493] Updated weights for policy 1, policy_version 34400 (0.0018) [2023-09-22 13:56:50,053][06567] Updated weights for policy 0, policy_version 34960 (0.0015) [2023-09-22 13:56:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 17756160. Throughput: 0: 787.4, 1: 787.7. Samples: 4401161. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:56:51,155][05066] Avg episode reward: [(0, '15.570'), (1, '16.830')] [2023-09-22 13:56:56,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 17788928. Throughput: 0: 791.8, 1: 790.6. Samples: 4410921. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 13:56:56,154][05066] Avg episode reward: [(0, '15.710'), (1, '16.000')] [2023-09-22 13:57:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17821696. Throughput: 0: 787.8, 1: 786.8. Samples: 4420224. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:01,155][05066] Avg episode reward: [(0, '16.540'), (1, '16.410')] [2023-09-22 13:57:03,069][06567] Updated weights for policy 0, policy_version 35120 (0.0016) [2023-09-22 13:57:03,070][06493] Updated weights for policy 1, policy_version 34560 (0.0016) [2023-09-22 13:57:06,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17854464. Throughput: 0: 787.3, 1: 786.8. Samples: 4424882. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:06,155][05066] Avg episode reward: [(0, '15.890'), (1, '16.370')] [2023-09-22 13:57:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 17887232. Throughput: 0: 783.4, 1: 784.3. Samples: 4434078. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:11,155][05066] Avg episode reward: [(0, '15.510'), (1, '16.290')] [2023-09-22 13:57:16,084][06493] Updated weights for policy 1, policy_version 34720 (0.0016) [2023-09-22 13:57:16,084][06567] Updated weights for policy 0, policy_version 35280 (0.0015) [2023-09-22 13:57:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6275.9). Total num frames: 17920000. Throughput: 0: 789.2, 1: 789.8. Samples: 4443889. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:16,155][05066] Avg episode reward: [(0, '16.080'), (1, '16.270')] [2023-09-22 13:57:21,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 17944576. Throughput: 0: 786.9, 1: 787.0. Samples: 4448440. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:21,155][05066] Avg episode reward: [(0, '14.760'), (1, '16.440')] [2023-09-22 13:57:26,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 17977344. Throughput: 0: 791.9, 1: 792.0. Samples: 4458125. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:26,156][05066] Avg episode reward: [(0, '15.550'), (1, '16.930')] [2023-09-22 13:57:29,219][06493] Updated weights for policy 1, policy_version 34880 (0.0016) [2023-09-22 13:57:29,220][06567] Updated weights for policy 0, policy_version 35440 (0.0015) [2023-09-22 13:57:31,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 18010112. Throughput: 0: 783.5, 1: 781.6. Samples: 4467082. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:31,154][05066] Avg episode reward: [(0, '15.130'), (1, '16.780')] [2023-09-22 13:57:36,154][05066] Fps is (10 sec: 6553.9, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 18042880. Throughput: 0: 787.9, 1: 786.5. Samples: 4472010. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:36,154][05066] Avg episode reward: [(0, '14.790'), (1, '16.620')] [2023-09-22 13:57:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18075648. Throughput: 0: 780.6, 1: 781.0. Samples: 4481193. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:57:41,154][05066] Avg episode reward: [(0, '14.810'), (1, '16.550')] [2023-09-22 13:57:41,164][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000035024_8966144.pth... [2023-09-22 13:57:41,165][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000035584_9109504.pth... [2023-09-22 13:57:41,198][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000032080_8212480.pth [2023-09-22 13:57:41,201][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000032640_8355840.pth [2023-09-22 13:57:42,194][06493] Updated weights for policy 1, policy_version 35040 (0.0016) [2023-09-22 13:57:42,195][06567] Updated weights for policy 0, policy_version 35600 (0.0015) [2023-09-22 13:57:46,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6348.8, 300 sec: 6275.9). Total num frames: 18108416. Throughput: 0: 786.8, 1: 786.6. Samples: 4491026. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:57:46,155][05066] Avg episode reward: [(0, '15.160'), (1, '16.140')] [2023-09-22 13:57:51,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 18132992. Throughput: 0: 783.5, 1: 783.5. Samples: 4495397. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:57:51,155][05066] Avg episode reward: [(0, '15.460'), (1, '16.580')] [2023-09-22 13:57:55,210][06493] Updated weights for policy 1, policy_version 35200 (0.0017) [2023-09-22 13:57:55,210][06567] Updated weights for policy 0, policy_version 35760 (0.0018) [2023-09-22 13:57:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 18165760. Throughput: 0: 788.8, 1: 788.5. Samples: 4505055. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:57:56,155][05066] Avg episode reward: [(0, '15.550'), (1, '16.460')] [2023-09-22 13:58:01,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 18198528. Throughput: 0: 781.2, 1: 780.7. Samples: 4514175. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:58:01,154][05066] Avg episode reward: [(0, '15.570'), (1, '17.060')] [2023-09-22 13:58:06,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 18231296. Throughput: 0: 783.6, 1: 783.4. Samples: 4518957. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:58:06,154][05066] Avg episode reward: [(0, '15.730'), (1, '16.620')] [2023-09-22 13:58:08,227][06567] Updated weights for policy 0, policy_version 35920 (0.0018) [2023-09-22 13:58:08,233][06493] Updated weights for policy 1, policy_version 35360 (0.0015) [2023-09-22 13:58:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18264064. Throughput: 0: 780.5, 1: 780.2. Samples: 4528355. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:11,155][05066] Avg episode reward: [(0, '15.700'), (1, '16.770')] [2023-09-22 13:58:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18296832. Throughput: 0: 788.7, 1: 788.4. Samples: 4538052. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:16,154][05066] Avg episode reward: [(0, '16.360'), (1, '16.910')] [2023-09-22 13:58:21,154][05066] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6248.1). Total num frames: 18321408. Throughput: 0: 782.3, 1: 783.6. Samples: 4542477. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:21,154][05066] Avg episode reward: [(0, '15.840'), (1, '17.780')] [2023-09-22 13:58:21,287][06493] Updated weights for policy 1, policy_version 35520 (0.0015) [2023-09-22 13:58:21,289][06567] Updated weights for policy 0, policy_version 36080 (0.0018) [2023-09-22 13:58:26,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 18354176. Throughput: 0: 786.4, 1: 788.1. Samples: 4552043. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:26,154][05066] Avg episode reward: [(0, '15.750'), (1, '17.640')] [2023-09-22 13:58:31,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18386944. Throughput: 0: 779.3, 1: 780.0. Samples: 4561194. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:31,154][05066] Avg episode reward: [(0, '15.590'), (1, '17.230')] [2023-09-22 13:58:34,408][06493] Updated weights for policy 1, policy_version 35680 (0.0017) [2023-09-22 13:58:34,408][06567] Updated weights for policy 0, policy_version 36240 (0.0015) [2023-09-22 13:58:36,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18419712. Throughput: 0: 786.0, 1: 786.0. Samples: 4566139. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:36,154][05066] Avg episode reward: [(0, '15.810'), (1, '17.700')] [2023-09-22 13:58:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18452480. Throughput: 0: 781.7, 1: 781.8. Samples: 4575411. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:41,155][05066] Avg episode reward: [(0, '15.200'), (1, '17.060')] [2023-09-22 13:58:46,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 18485248. Throughput: 0: 790.6, 1: 789.8. Samples: 4585294. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:46,155][05066] Avg episode reward: [(0, '14.660'), (1, '17.630')] [2023-09-22 13:58:47,395][06493] Updated weights for policy 1, policy_version 35840 (0.0013) [2023-09-22 13:58:47,396][06567] Updated weights for policy 0, policy_version 36400 (0.0014) [2023-09-22 13:58:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18509824. Throughput: 0: 784.1, 1: 785.1. Samples: 4589569. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:58:51,154][05066] Avg episode reward: [(0, '15.010'), (1, '17.530')] [2023-09-22 13:58:56,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18542592. Throughput: 0: 780.5, 1: 780.3. Samples: 4598589. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:58:56,155][05066] Avg episode reward: [(0, '15.420'), (1, '17.930')] [2023-09-22 13:58:56,165][06278] Saving new best policy, reward=17.930! [2023-09-22 13:59:00,809][06493] Updated weights for policy 1, policy_version 36000 (0.0016) [2023-09-22 13:59:00,809][06567] Updated weights for policy 0, policy_version 36560 (0.0017) [2023-09-22 13:59:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18575360. Throughput: 0: 776.6, 1: 777.8. Samples: 4608000. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:59:01,154][05066] Avg episode reward: [(0, '16.400'), (1, '17.570')] [2023-09-22 13:59:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18608128. Throughput: 0: 780.8, 1: 780.0. Samples: 4612714. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:59:06,155][05066] Avg episode reward: [(0, '16.110'), (1, '16.980')] [2023-09-22 13:59:11,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 18632704. Throughput: 0: 781.5, 1: 780.6. Samples: 4622336. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:59:11,155][05066] Avg episode reward: [(0, '15.620'), (1, '17.160')] [2023-09-22 13:59:13,755][06493] Updated weights for policy 1, policy_version 36160 (0.0013) [2023-09-22 13:59:13,755][06567] Updated weights for policy 0, policy_version 36720 (0.0015) [2023-09-22 13:59:16,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 18665472. Throughput: 0: 782.6, 1: 782.5. Samples: 4631622. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 13:59:16,155][05066] Avg episode reward: [(0, '16.570'), (1, '16.680')] [2023-09-22 13:59:21,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18698240. Throughput: 0: 781.4, 1: 780.8. Samples: 4636438. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:59:21,155][05066] Avg episode reward: [(0, '16.440'), (1, '17.480')] [2023-09-22 13:59:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18731008. Throughput: 0: 783.4, 1: 783.3. Samples: 4645914. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:59:26,155][05066] Avg episode reward: [(0, '15.780'), (1, '16.440')] [2023-09-22 13:59:26,825][06493] Updated weights for policy 1, policy_version 36320 (0.0017) [2023-09-22 13:59:26,825][06567] Updated weights for policy 0, policy_version 36880 (0.0017) [2023-09-22 13:59:31,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18763776. Throughput: 0: 774.9, 1: 776.5. Samples: 4655104. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:59:31,154][05066] Avg episode reward: [(0, '15.600'), (1, '16.320')] [2023-09-22 13:59:36,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18796544. Throughput: 0: 779.5, 1: 778.9. Samples: 4659697. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:59:36,154][05066] Avg episode reward: [(0, '15.210'), (1, '16.440')] [2023-09-22 13:59:39,844][06493] Updated weights for policy 1, policy_version 36480 (0.0018) [2023-09-22 13:59:39,844][06567] Updated weights for policy 0, policy_version 37040 (0.0016) [2023-09-22 13:59:41,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18829312. Throughput: 0: 786.8, 1: 787.7. Samples: 4669440. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 13:59:41,154][05066] Avg episode reward: [(0, '15.900'), (1, '15.420')] [2023-09-22 13:59:41,163][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000036496_9342976.pth... [2023-09-22 13:59:41,163][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000037056_9486336.pth... [2023-09-22 13:59:41,199][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000033552_8589312.pth [2023-09-22 13:59:41,200][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000034112_8732672.pth [2023-09-22 13:59:46,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 18853888. Throughput: 0: 788.8, 1: 788.7. Samples: 4678984. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:59:46,155][05066] Avg episode reward: [(0, '16.150'), (1, '14.950')] [2023-09-22 13:59:51,154][05066] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18886656. Throughput: 0: 789.1, 1: 790.0. Samples: 4683776. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:59:51,154][05066] Avg episode reward: [(0, '16.740'), (1, '15.810')] [2023-09-22 13:59:52,803][06493] Updated weights for policy 1, policy_version 36640 (0.0015) [2023-09-22 13:59:52,803][06567] Updated weights for policy 0, policy_version 37200 (0.0016) [2023-09-22 13:59:56,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 18919424. Throughput: 0: 786.7, 1: 785.6. Samples: 4693090. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 13:59:56,154][05066] Avg episode reward: [(0, '17.010'), (1, '15.390')] [2023-09-22 13:59:56,163][06078] Saving new best policy, reward=17.010! [2023-09-22 14:00:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 18952192. Throughput: 0: 785.6, 1: 785.5. Samples: 4702319. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:01,154][05066] Avg episode reward: [(0, '17.200'), (1, '15.480')] [2023-09-22 14:00:01,155][06078] Saving new best policy, reward=17.200! [2023-09-22 14:00:05,942][06493] Updated weights for policy 1, policy_version 36800 (0.0017) [2023-09-22 14:00:05,942][06567] Updated weights for policy 0, policy_version 37360 (0.0015) [2023-09-22 14:00:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 18984960. Throughput: 0: 786.3, 1: 786.0. Samples: 4707189. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:06,154][05066] Avg episode reward: [(0, '16.960'), (1, '15.320')] [2023-09-22 14:00:11,154][05066] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19009536. Throughput: 0: 784.4, 1: 784.5. Samples: 4716514. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:11,155][05066] Avg episode reward: [(0, '17.210'), (1, '15.020')] [2023-09-22 14:00:11,213][06078] Saving new best policy, reward=17.210! [2023-09-22 14:00:16,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19042304. Throughput: 0: 784.1, 1: 783.6. Samples: 4725648. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:16,155][05066] Avg episode reward: [(0, '16.530'), (1, '17.030')] [2023-09-22 14:00:19,242][06493] Updated weights for policy 1, policy_version 36960 (0.0017) [2023-09-22 14:00:19,242][06567] Updated weights for policy 0, policy_version 37520 (0.0017) [2023-09-22 14:00:21,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19075072. Throughput: 0: 785.1, 1: 785.2. Samples: 4730357. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:21,154][05066] Avg episode reward: [(0, '17.190'), (1, '16.150')] [2023-09-22 14:00:26,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 19107840. Throughput: 0: 774.3, 1: 773.8. Samples: 4739106. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:26,154][05066] Avg episode reward: [(0, '16.980'), (1, '14.090')] [2023-09-22 14:00:31,154][05066] Fps is (10 sec: 6143.9, 60 sec: 6212.2, 300 sec: 6262.0). Total num frames: 19136512. Throughput: 0: 777.0, 1: 776.4. Samples: 4748887. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:31,155][05066] Avg episode reward: [(0, '16.470'), (1, '13.190')] [2023-09-22 14:00:32,527][06567] Updated weights for policy 0, policy_version 37680 (0.0017) [2023-09-22 14:00:32,527][06493] Updated weights for policy 1, policy_version 37120 (0.0016) [2023-09-22 14:00:36,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19165184. Throughput: 0: 773.7, 1: 773.7. Samples: 4753409. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:36,155][05066] Avg episode reward: [(0, '15.970'), (1, '13.390')] [2023-09-22 14:00:41,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19197952. Throughput: 0: 774.3, 1: 774.8. Samples: 4762803. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:41,155][05066] Avg episode reward: [(0, '15.820'), (1, '13.160')] [2023-09-22 14:00:45,520][06567] Updated weights for policy 0, policy_version 37840 (0.0018) [2023-09-22 14:00:45,524][06493] Updated weights for policy 1, policy_version 37280 (0.0016) [2023-09-22 14:00:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19230720. Throughput: 0: 776.2, 1: 776.2. Samples: 4772179. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:46,155][05066] Avg episode reward: [(0, '15.930'), (1, '12.230')] [2023-09-22 14:00:51,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19263488. Throughput: 0: 777.0, 1: 777.2. Samples: 4777132. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:51,154][05066] Avg episode reward: [(0, '15.300'), (1, '13.310')] [2023-09-22 14:00:56,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19296256. Throughput: 0: 774.0, 1: 774.4. Samples: 4786193. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:00:56,154][05066] Avg episode reward: [(0, '15.170'), (1, '14.200')] [2023-09-22 14:00:58,581][06493] Updated weights for policy 1, policy_version 37440 (0.0015) [2023-09-22 14:00:58,581][06567] Updated weights for policy 0, policy_version 38000 (0.0016) [2023-09-22 14:01:01,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19329024. Throughput: 0: 782.5, 1: 782.0. Samples: 4796054. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 14:01:01,155][05066] Avg episode reward: [(0, '15.290'), (1, '15.590')] [2023-09-22 14:01:06,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19353600. Throughput: 0: 779.2, 1: 779.8. Samples: 4800512. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 14:01:06,155][05066] Avg episode reward: [(0, '15.500'), (1, '16.090')] [2023-09-22 14:01:11,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 19386368. Throughput: 0: 788.0, 1: 787.6. Samples: 4810006. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 14:01:11,154][05066] Avg episode reward: [(0, '15.560'), (1, '16.250')] [2023-09-22 14:01:11,648][06567] Updated weights for policy 0, policy_version 38160 (0.0012) [2023-09-22 14:01:11,648][06493] Updated weights for policy 1, policy_version 37600 (0.0013) [2023-09-22 14:01:16,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19419136. Throughput: 0: 780.6, 1: 780.5. Samples: 4819140. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 14:01:16,155][05066] Avg episode reward: [(0, '15.190'), (1, '16.370')] [2023-09-22 14:01:21,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19451904. Throughput: 0: 786.2, 1: 785.5. Samples: 4824134. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 14:01:21,154][05066] Avg episode reward: [(0, '15.510'), (1, '16.430')] [2023-09-22 14:01:24,764][06493] Updated weights for policy 1, policy_version 37760 (0.0015) [2023-09-22 14:01:24,764][06567] Updated weights for policy 0, policy_version 38320 (0.0014) [2023-09-22 14:01:26,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19484672. Throughput: 0: 782.8, 1: 783.3. Samples: 4833281. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:01:26,155][05066] Avg episode reward: [(0, '16.220'), (1, '15.720')] [2023-09-22 14:01:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6248.1). Total num frames: 19509248. Throughput: 0: 784.3, 1: 783.8. Samples: 4842745. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:01:31,155][05066] Avg episode reward: [(0, '16.180'), (1, '16.160')] [2023-09-22 14:01:36,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19542016. Throughput: 0: 782.0, 1: 783.1. Samples: 4847562. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:01:36,155][05066] Avg episode reward: [(0, '16.610'), (1, '15.730')] [2023-09-22 14:01:37,866][06567] Updated weights for policy 0, policy_version 38480 (0.0015) [2023-09-22 14:01:37,867][06493] Updated weights for policy 1, policy_version 37920 (0.0015) [2023-09-22 14:01:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6262.0). Total num frames: 19574784. Throughput: 0: 784.2, 1: 784.0. Samples: 4856762. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:01:41,154][05066] Avg episode reward: [(0, '15.930'), (1, '15.650')] [2023-09-22 14:01:41,162][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000038512_9859072.pth... [2023-09-22 14:01:41,162][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000037952_9715712.pth... [2023-09-22 14:01:41,194][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000035024_8966144.pth [2023-09-22 14:01:41,197][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000035584_9109504.pth [2023-09-22 14:01:46,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6275.9). Total num frames: 19607552. Throughput: 0: 777.2, 1: 778.2. Samples: 4866048. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:01:46,154][05066] Avg episode reward: [(0, '16.100'), (1, '16.110')] [2023-09-22 14:01:51,120][06567] Updated weights for policy 0, policy_version 38640 (0.0016) [2023-09-22 14:01:51,121][06493] Updated weights for policy 1, policy_version 38080 (0.0017) [2023-09-22 14:01:51,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19640320. Throughput: 0: 778.7, 1: 777.6. Samples: 4870547. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 14:01:51,154][05066] Avg episode reward: [(0, '15.590'), (1, '16.410')] [2023-09-22 14:01:56,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19664896. Throughput: 0: 779.5, 1: 779.7. Samples: 4880171. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 14:01:56,155][05066] Avg episode reward: [(0, '14.460'), (1, '16.600')] [2023-09-22 14:02:01,154][05066] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6248.1). Total num frames: 19697664. Throughput: 0: 781.3, 1: 782.0. Samples: 4889488. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 14:02:01,155][05066] Avg episode reward: [(0, '14.110'), (1, '15.730')] [2023-09-22 14:02:04,094][06567] Updated weights for policy 0, policy_version 38800 (0.0016) [2023-09-22 14:02:04,095][06493] Updated weights for policy 1, policy_version 38240 (0.0015) [2023-09-22 14:02:06,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19730432. Throughput: 0: 780.5, 1: 780.7. Samples: 4894387. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 14:02:06,155][05066] Avg episode reward: [(0, '14.050'), (1, '15.670')] [2023-09-22 14:02:11,154][05066] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19763200. Throughput: 0: 782.7, 1: 782.6. Samples: 4903720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 14:02:11,155][05066] Avg episode reward: [(0, '14.340'), (1, '15.250')] [2023-09-22 14:02:16,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19795968. Throughput: 0: 781.7, 1: 782.9. Samples: 4913152. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:16,155][05066] Avg episode reward: [(0, '13.520'), (1, '14.610')] [2023-09-22 14:02:17,141][06493] Updated weights for policy 1, policy_version 38400 (0.0017) [2023-09-22 14:02:17,142][06567] Updated weights for policy 0, policy_version 38960 (0.0015) [2023-09-22 14:02:21,154][05066] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19828736. Throughput: 0: 781.1, 1: 780.7. Samples: 4917843. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:21,154][05066] Avg episode reward: [(0, '14.330'), (1, '14.670')] [2023-09-22 14:02:26,154][05066] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6262.0). Total num frames: 19857408. Throughput: 0: 785.6, 1: 786.1. Samples: 4927488. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:26,154][05066] Avg episode reward: [(0, '14.080'), (1, '14.830')] [2023-09-22 14:02:30,194][06493] Updated weights for policy 1, policy_version 38560 (0.0017) [2023-09-22 14:02:30,194][06567] Updated weights for policy 0, policy_version 39120 (0.0019) [2023-09-22 14:02:31,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19886080. Throughput: 0: 785.2, 1: 784.6. Samples: 4936690. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:31,154][05066] Avg episode reward: [(0, '15.160'), (1, '15.690')] [2023-09-22 14:02:36,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19918848. Throughput: 0: 788.5, 1: 788.3. Samples: 4941502. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:36,154][05066] Avg episode reward: [(0, '15.270'), (1, '15.630')] [2023-09-22 14:02:41,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 19951616. Throughput: 0: 784.8, 1: 785.0. Samples: 4950815. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:41,154][05066] Avg episode reward: [(0, '14.350'), (1, '16.240')] [2023-09-22 14:02:43,213][06567] Updated weights for policy 0, policy_version 39280 (0.0016) [2023-09-22 14:02:43,213][06493] Updated weights for policy 1, policy_version 38720 (0.0016) [2023-09-22 14:02:46,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 19984384. Throughput: 0: 786.3, 1: 786.4. Samples: 4960261. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:46,154][05066] Avg episode reward: [(0, '15.060'), (1, '15.900')] [2023-09-22 14:02:51,154][05066] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 20017152. Throughput: 0: 785.9, 1: 785.6. Samples: 4965104. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:51,154][05066] Avg episode reward: [(0, '16.070'), (1, '16.930')] [2023-09-22 14:02:56,154][05066] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 6262.0). Total num frames: 20045824. Throughput: 0: 787.4, 1: 787.5. Samples: 4974592. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:02:56,155][05066] Avg episode reward: [(0, '15.940'), (1, '17.870')] [2023-09-22 14:02:56,209][06493] Updated weights for policy 1, policy_version 38880 (0.0016) [2023-09-22 14:02:56,210][06567] Updated weights for policy 0, policy_version 39440 (0.0016) [2023-09-22 14:03:01,154][05066] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 20074496. Throughput: 0: 785.9, 1: 785.4. Samples: 4983862. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:03:01,154][05066] Avg episode reward: [(0, '15.730'), (1, '17.820')] [2023-09-22 14:03:06,154][05066] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 20107264. Throughput: 0: 788.3, 1: 787.1. Samples: 4988737. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:03:06,155][05066] Avg episode reward: [(0, '14.040'), (1, '16.220')] [2023-09-22 14:03:09,215][06493] Updated weights for policy 1, policy_version 39040 (0.0017) [2023-09-22 14:03:09,215][06567] Updated weights for policy 0, policy_version 39600 (0.0016) [2023-09-22 14:03:11,154][05066] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6248.1). Total num frames: 20140032. Throughput: 0: 783.7, 1: 782.6. Samples: 4997975. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 14:03:11,154][05066] Avg episode reward: [(0, '14.460'), (1, '15.750')] [2023-09-22 14:03:13,189][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-09-22 14:03:13,189][06607] Stopping RolloutWorker_w5... [2023-09-22 14:03:13,189][06601] Stopping RolloutWorker_w4... [2023-09-22 14:03:13,189][06600] Stopping RolloutWorker_w3... [2023-09-22 14:03:13,189][06609] Stopping RolloutWorker_w6... [2023-09-22 14:03:13,190][06611] Stopping RolloutWorker_w7... [2023-09-22 14:03:13,190][06599] Stopping RolloutWorker_w2... [2023-09-22 14:03:13,190][06590] Stopping RolloutWorker_w1... [2023-09-22 14:03:13,190][06571] Stopping RolloutWorker_w0... [2023-09-22 14:03:13,190][06607] Loop rollout_proc5_evt_loop terminating... [2023-09-22 14:03:13,190][06600] Loop rollout_proc3_evt_loop terminating... [2023-09-22 14:03:13,190][06601] Loop rollout_proc4_evt_loop terminating... [2023-09-22 14:03:13,190][06609] Loop rollout_proc6_evt_loop terminating... [2023-09-22 14:03:13,190][06599] Loop rollout_proc2_evt_loop terminating... [2023-09-22 14:03:13,190][06611] Loop rollout_proc7_evt_loop terminating... [2023-09-22 14:03:13,190][05066] Component RolloutWorker_w5 stopped! [2023-09-22 14:03:13,190][06571] Loop rollout_proc0_evt_loop terminating... [2023-09-22 14:03:13,191][06590] Loop rollout_proc1_evt_loop terminating... [2023-09-22 14:03:13,191][05066] Component RolloutWorker_w4 stopped! [2023-09-22 14:03:13,192][05066] Component RolloutWorker_w6 stopped! [2023-09-22 14:03:13,192][06078] Stopping Batcher_0... [2023-09-22 14:03:13,193][05066] Component RolloutWorker_w3 stopped! [2023-09-22 14:03:13,193][06078] Loop batcher_evt_loop terminating... [2023-09-22 14:03:13,194][05066] Component RolloutWorker_w7 stopped! [2023-09-22 14:03:13,194][05066] Component RolloutWorker_w2 stopped! [2023-09-22 14:03:13,195][05066] Component RolloutWorker_w1 stopped! [2023-09-22 14:03:13,195][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000039648_10149888.pth... [2023-09-22 14:03:13,195][05066] Component RolloutWorker_w0 stopped! [2023-09-22 14:03:13,196][05066] Component Batcher_0 stopped! [2023-09-22 14:03:13,200][05066] Component Batcher_1 stopped! [2023-09-22 14:03:13,209][06278] Stopping Batcher_1... [2023-09-22 14:03:13,219][06278] Loop batcher_evt_loop terminating... [2023-09-22 14:03:13,219][06278] Removing ./train_atari/Asterix/checkpoint_p1/checkpoint_000036496_9342976.pth [2023-09-22 14:03:13,223][06278] Saving ./train_atari/Asterix/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-09-22 14:03:13,226][06078] Removing ./train_atari/Asterix/checkpoint_p0/checkpoint_000037056_9486336.pth [2023-09-22 14:03:13,230][06078] Saving ./train_atari/Asterix/checkpoint_p0/checkpoint_000039648_10149888.pth... [2023-09-22 14:03:13,238][06567] Weights refcount: 2 0 [2023-09-22 14:03:13,240][06567] Stopping InferenceWorker_p0-w0... [2023-09-22 14:03:13,240][06567] Loop inference_proc0-0_evt_loop terminating... [2023-09-22 14:03:13,240][05066] Component InferenceWorker_p0-w0 stopped! [2023-09-22 14:03:13,259][06493] Weights refcount: 2 0 [2023-09-22 14:03:13,260][06278] Stopping LearnerWorker_p1... [2023-09-22 14:03:13,260][06278] Loop learner_proc1_evt_loop terminating... [2023-09-22 14:03:13,260][05066] Component LearnerWorker_p1 stopped! [2023-09-22 14:03:13,260][06493] Stopping InferenceWorker_p1-w0... [2023-09-22 14:03:13,261][06493] Loop inference_proc1-0_evt_loop terminating... [2023-09-22 14:03:13,262][05066] Component InferenceWorker_p1-w0 stopped! [2023-09-22 14:03:13,265][06078] Stopping LearnerWorker_p0... [2023-09-22 14:03:13,265][06078] Loop learner_proc0_evt_loop terminating... [2023-09-22 14:03:13,265][05066] Component LearnerWorker_p0 stopped! [2023-09-22 14:03:13,266][05066] Waiting for process learner_proc0 to stop... [2023-09-22 14:03:13,951][05066] Waiting for process learner_proc1 to stop... [2023-09-22 14:03:13,952][05066] Waiting for process inference_proc0-0 to join... [2023-09-22 14:03:13,953][05066] Waiting for process inference_proc1-0 to join... [2023-09-22 14:03:13,954][05066] Waiting for process rollout_proc0 to join... [2023-09-22 14:03:13,955][05066] Waiting for process rollout_proc1 to join... [2023-09-22 14:03:13,955][05066] Waiting for process rollout_proc2 to join... [2023-09-22 14:03:13,956][05066] Waiting for process rollout_proc3 to join... [2023-09-22 14:03:13,957][05066] Waiting for process rollout_proc4 to join... [2023-09-22 14:03:13,957][05066] Waiting for process rollout_proc5 to join... [2023-09-22 14:03:13,958][05066] Waiting for process rollout_proc6 to join... [2023-09-22 14:03:13,959][05066] Waiting for process rollout_proc7 to join... [2023-09-22 14:03:13,959][05066] Batcher 0 profile tree view: batching: 20.8275, releasing_batches: 1.8919 [2023-09-22 14:03:13,960][05066] Batcher 1 profile tree view: batching: 20.7133, releasing_batches: 1.8507 [2023-09-22 14:03:13,960][05066] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 661.4699 update_model: 37.0772 weight_update: 0.0017 one_step: 0.0012 handle_policy_step: 2296.2188 deserialize: 67.8833, stack: 16.9315, obs_to_device_normalize: 557.2605, forward: 1104.9107, send_messages: 97.9036 prepare_outputs: 305.2011 to_cpu: 154.2424 [2023-09-22 14:03:13,961][05066] InferenceWorker_p1-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 656.2659 update_model: 37.7987 weight_update: 0.0017 one_step: 0.0013 handle_policy_step: 2302.1539 deserialize: 67.9945, stack: 15.9592, obs_to_device_normalize: 559.5312, forward: 1111.9173, send_messages: 96.2618 prepare_outputs: 305.6226 to_cpu: 154.9113 [2023-09-22 14:03:13,961][05066] Learner 0 profile tree view: misc: 0.0150, prepare_batch: 31.6576 train: 463.3443 epoch_init: 0.1138, minibatch_init: 3.3997, losses_postprocess: 59.1333, kl_divergence: 5.7712, after_optimizer: 10.5265 calculate_losses: 48.0961 losses_init: 0.1076, forward_head: 14.5733, bptt_initial: 0.4673, bptt: 0.5032, tail: 11.3336, advantages_returns: 3.3139, losses: 13.9285 update: 331.9582 clip: 165.1061 [2023-09-22 14:03:13,961][05066] Learner 1 profile tree view: misc: 0.0163, prepare_batch: 31.7911 train: 456.0008 epoch_init: 0.1109, minibatch_init: 3.4085, losses_postprocess: 58.7613, kl_divergence: 5.8293, after_optimizer: 20.4145 calculate_losses: 49.3560 losses_init: 0.1145, forward_head: 15.6679, bptt_initial: 0.4682, bptt: 0.4868, tail: 11.2489, advantages_returns: 3.3472, losses: 14.1018 update: 313.7142 clip: 164.9680 [2023-09-22 14:03:13,962][05066] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3829, enqueue_policy_requests: 45.5691, env_step: 1089.6953, overhead: 31.3048, complete_rollouts: 1.0807 save_policy_outputs: 59.1313 split_output_tensors: 20.3981 [2023-09-22 14:03:13,962][05066] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3930, enqueue_policy_requests: 44.9695, env_step: 1085.6652, overhead: 31.6007, complete_rollouts: 1.1226 save_policy_outputs: 57.3295 split_output_tensors: 19.4067 [2023-09-22 14:03:13,962][05066] Loop Runner_EvtLoop terminating... [2023-09-22 14:03:13,963][05066] Runner profile tree view: main_loop: 3208.4263 [2023-09-22 14:03:13,963][05066] Collected {0: 10149888, 1: 10006528}, FPS: 6237.7