[2023-09-22 08:43:55,148][16195] Saving configuration to ./train_atari/Alien/config.json... [2023-09-22 08:43:55,413][16195] Rollout worker 0 uses device cpu [2023-09-22 08:43:55,414][16195] Rollout worker 1 uses device cpu [2023-09-22 08:43:55,414][16195] Rollout worker 2 uses device cpu [2023-09-22 08:43:55,414][16195] Rollout worker 3 uses device cpu [2023-09-22 08:43:55,414][16195] Rollout worker 4 uses device cpu [2023-09-22 08:43:55,415][16195] Rollout worker 5 uses device cpu [2023-09-22 08:43:55,415][16195] Rollout worker 6 uses device cpu [2023-09-22 08:43:55,415][16195] Rollout worker 7 uses device cpu [2023-09-22 08:43:55,415][16195] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-22 08:43:55,462][16195] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:43:55,463][16195] InferenceWorker_p0-w0: min num requests: 2 [2023-09-22 08:43:55,487][16195] Starting all processes... [2023-09-22 08:43:55,487][16195] Starting process learner_proc0 [2023-09-22 08:43:57,143][16195] Starting all processes... [2023-09-22 08:43:57,147][16649] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:43:57,147][16649] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-22 08:43:57,150][16195] Starting process inference_proc0-0 [2023-09-22 08:43:57,150][16195] Starting process rollout_proc0 [2023-09-22 08:43:57,150][16195] Starting process rollout_proc1 [2023-09-22 08:43:57,151][16195] Starting process rollout_proc2 [2023-09-22 08:43:57,159][16195] Starting process rollout_proc3 [2023-09-22 08:43:57,160][16195] Starting process rollout_proc4 [2023-09-22 08:43:57,167][16195] Starting process rollout_proc5 [2023-09-22 08:43:57,168][16195] Starting process rollout_proc6 [2023-09-22 08:43:57,168][16195] Starting process rollout_proc7 [2023-09-22 08:43:57,186][16649] Num visible devices: 1 [2023-09-22 08:43:57,276][16649] Starting seed is not provided [2023-09-22 08:43:57,276][16649] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:43:57,276][16649] Initializing actor-critic model on device cuda:0 [2023-09-22 08:43:57,277][16649] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 08:43:57,278][16649] RunningMeanStd input shape: (1,) [2023-09-22 08:43:57,358][16649] ConvEncoder: input_channels=4 [2023-09-22 08:43:57,636][16649] Conv encoder output size: 512 [2023-09-22 08:43:57,646][16649] Created Actor Critic model with architecture: [2023-09-22 08:43:57,646][16649] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-22 08:43:58,220][16649] Using optimizer [2023-09-22 08:43:58,220][16649] No checkpoints found [2023-09-22 08:43:58,221][16649] Did not load from checkpoint, starting from scratch! [2023-09-22 08:43:58,221][16649] Initialized policy 0 weights for model version 0 [2023-09-22 08:43:58,222][16649] LearnerWorker_p0 finished initialization! [2023-09-22 08:43:58,223][16649] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:43:59,059][16839] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-22 08:43:59,087][16836] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-22 08:43:59,089][16844] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-22 08:43:59,094][16841] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-22 08:43:59,110][16831] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-22 08:43:59,112][16830] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:43:59,113][16830] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-22 08:43:59,131][16830] Num visible devices: 1 [2023-09-22 08:43:59,138][16832] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-22 08:43:59,168][16840] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-22 08:43:59,208][16835] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-22 08:43:59,776][16830] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 08:43:59,776][16830] RunningMeanStd input shape: (1,) [2023-09-22 08:43:59,787][16830] ConvEncoder: input_channels=4 [2023-09-22 08:43:59,885][16830] Conv encoder output size: 512 [2023-09-22 08:43:59,890][16195] Inference worker 0-0 is ready! [2023-09-22 08:43:59,890][16195] All inference workers are ready! Signal rollout workers to start! [2023-09-22 08:44:00,377][16840] Decorrelating experience for 0 frames... [2023-09-22 08:44:00,377][16844] Decorrelating experience for 0 frames... [2023-09-22 08:44:00,380][16831] Decorrelating experience for 0 frames... [2023-09-22 08:44:00,384][16835] Decorrelating experience for 0 frames... [2023-09-22 08:44:00,385][16841] Decorrelating experience for 0 frames... [2023-09-22 08:44:00,400][16836] Decorrelating experience for 0 frames... [2023-09-22 08:44:00,453][16839] Decorrelating experience for 0 frames... [2023-09-22 08:44:00,453][16832] Decorrelating experience for 0 frames... [2023-09-22 08:44:01,241][16195] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-22 08:44:06,241][16195] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4096. Throughput: 0: 409.6. Samples: 2048. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 08:44:06,242][16195] Avg episode reward: [(0, '7.833')] [2023-09-22 08:44:10,877][16649] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000002 [2023-09-22 08:44:11,241][16195] Fps is (10 sec: 2867.3, 60 sec: 2867.3, 300 sec: 2867.3). Total num frames: 28672. Throughput: 0: 506.4. Samples: 5064. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:44:11,241][16195] Avg episode reward: [(0, '6.029')] [2023-09-22 08:44:13,665][16830] Updated weights for policy 0, policy_version 152 (0.0018) [2023-09-22 08:44:15,456][16195] Heartbeat connected on Batcher_0 [2023-09-22 08:44:15,466][16195] Heartbeat connected on RolloutWorker_w0 [2023-09-22 08:44:15,469][16195] Heartbeat connected on RolloutWorker_w1 [2023-09-22 08:44:15,472][16195] Heartbeat connected on RolloutWorker_w2 [2023-09-22 08:44:15,475][16195] Heartbeat connected on RolloutWorker_w3 [2023-09-22 08:44:15,477][16195] Heartbeat connected on RolloutWorker_w4 [2023-09-22 08:44:15,480][16195] Heartbeat connected on RolloutWorker_w5 [2023-09-22 08:44:15,483][16195] Heartbeat connected on RolloutWorker_w6 [2023-09-22 08:44:15,486][16195] Heartbeat connected on RolloutWorker_w7 [2023-09-22 08:44:15,504][16195] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-22 08:44:15,522][16195] Heartbeat connected on LearnerWorker_p0 [2023-09-22 08:44:16,241][16195] Fps is (10 sec: 4505.7, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 49152. Throughput: 0: 785.7. Samples: 11785. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:44:16,242][16195] Avg episode reward: [(0, '6.500')] [2023-09-22 08:44:21,241][16195] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3481.6). Total num frames: 69632. Throughput: 0: 898.1. Samples: 17963. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:44:21,242][16195] Avg episode reward: [(0, '6.698')] [2023-09-22 08:44:23,320][16830] Updated weights for policy 0, policy_version 312 (0.0019) [2023-09-22 08:44:24,837][16195] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 16195], exiting... [2023-09-22 08:44:24,838][16195] Runner profile tree view: main_loop: 29.3513 [2023-09-22 08:44:24,838][16841] Stopping RolloutWorker_w6... [2023-09-22 08:44:24,838][16835] Stopping RolloutWorker_w3... [2023-09-22 08:44:24,839][16844] Stopping RolloutWorker_w7... [2023-09-22 08:44:24,839][16831] Stopping RolloutWorker_w0... [2023-09-22 08:44:24,839][16840] Stopping RolloutWorker_w4... [2023-09-22 08:44:24,839][16195] Collected {0: 86016}, FPS: 2930.6 [2023-09-22 08:44:24,839][16839] Stopping RolloutWorker_w5... [2023-09-22 08:44:24,839][16836] Stopping RolloutWorker_w2... [2023-09-22 08:44:24,839][16832] Stopping RolloutWorker_w1... [2023-09-22 08:44:24,839][16835] Loop rollout_proc3_evt_loop terminating... [2023-09-22 08:44:24,839][16841] Loop rollout_proc6_evt_loop terminating... [2023-09-22 08:44:24,839][16831] Loop rollout_proc0_evt_loop terminating... [2023-09-22 08:44:24,839][16844] Loop rollout_proc7_evt_loop terminating... [2023-09-22 08:44:24,839][16649] Stopping Batcher_0... [2023-09-22 08:44:24,839][16840] Loop rollout_proc4_evt_loop terminating... [2023-09-22 08:44:24,839][16836] Loop rollout_proc2_evt_loop terminating... [2023-09-22 08:44:24,839][16839] Loop rollout_proc5_evt_loop terminating... [2023-09-22 08:44:24,840][16832] Loop rollout_proc1_evt_loop terminating... [2023-09-22 08:44:24,840][16649] Loop batcher_evt_loop terminating... [2023-09-22 08:44:24,906][16830] Weights refcount: 2 0 [2023-09-22 08:44:24,907][16830] Stopping InferenceWorker_p0-w0... [2023-09-22 08:44:24,907][16830] Loop inference_proc0-0_evt_loop terminating... [2023-09-22 08:44:24,998][16649] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000000344_90112.pth... [2023-09-22 08:44:25,028][16649] Stopping LearnerWorker_p0... [2023-09-22 08:44:25,028][16649] Loop learner_proc0_evt_loop terminating... [2023-09-22 08:45:17,444][23569] Saving configuration to ./train_atari/Alien/config.json... [2023-09-22 08:45:17,663][23569] Rollout worker 0 uses device cpu [2023-09-22 08:45:17,664][23569] Rollout worker 1 uses device cpu [2023-09-22 08:45:17,664][23569] Rollout worker 2 uses device cpu [2023-09-22 08:45:17,664][23569] Rollout worker 3 uses device cpu [2023-09-22 08:45:17,665][23569] Rollout worker 4 uses device cpu [2023-09-22 08:45:17,665][23569] Rollout worker 5 uses device cpu [2023-09-22 08:45:17,665][23569] Rollout worker 6 uses device cpu [2023-09-22 08:45:17,665][23569] Rollout worker 7 uses device cpu [2023-09-22 08:45:17,666][23569] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-22 08:45:17,732][23569] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:45:17,733][23569] InferenceWorker_p0-w0: min num requests: 1 [2023-09-22 08:45:17,736][23569] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-22 08:45:17,736][23569] InferenceWorker_p1-w0: min num requests: 1 [2023-09-22 08:45:17,760][23569] Starting all processes... [2023-09-22 08:45:17,761][23569] Starting process learner_proc0 [2023-09-22 08:45:19,466][23569] Starting process learner_proc1 [2023-09-22 08:45:19,471][24306] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:45:19,471][24306] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-22 08:45:19,490][24306] Num visible devices: 1 [2023-09-22 08:45:19,528][24306] Starting seed is not provided [2023-09-22 08:45:19,528][24306] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:45:19,528][24306] Initializing actor-critic model on device cuda:0 [2023-09-22 08:45:19,528][24306] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 08:45:19,529][24306] RunningMeanStd input shape: (1,) [2023-09-22 08:45:19,547][24306] ConvEncoder: input_channels=4 [2023-09-22 08:45:19,716][24306] Conv encoder output size: 512 [2023-09-22 08:45:19,718][24306] Created Actor Critic model with architecture: [2023-09-22 08:45:19,718][24306] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-22 08:45:20,283][24306] Using optimizer [2023-09-22 08:45:20,284][24306] Loading state from checkpoint ./train_atari/Alien/checkpoint_p0/checkpoint_000000344_90112.pth... [2023-09-22 08:45:20,302][24306] Loading model from checkpoint [2023-09-22 08:45:20,305][24306] Loaded experiment state at self.train_step=344, self.env_steps=90112 [2023-09-22 08:45:20,305][24306] Initialized policy 0 weights for model version 344 [2023-09-22 08:45:20,306][24306] LearnerWorker_p0 finished initialization! [2023-09-22 08:45:20,307][24306] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:45:21,159][23569] Starting all processes... [2023-09-22 08:45:21,163][24495] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-22 08:45:21,163][24495] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-09-22 08:45:21,166][23569] Starting process inference_proc0-0 [2023-09-22 08:45:21,166][23569] Starting process inference_proc1-0 [2023-09-22 08:45:21,167][23569] Starting process rollout_proc0 [2023-09-22 08:45:21,167][23569] Starting process rollout_proc1 [2023-09-22 08:45:21,167][23569] Starting process rollout_proc2 [2023-09-22 08:45:21,168][23569] Starting process rollout_proc3 [2023-09-22 08:45:21,171][23569] Starting process rollout_proc4 [2023-09-22 08:45:21,172][23569] Starting process rollout_proc5 [2023-09-22 08:45:21,176][23569] Starting process rollout_proc6 [2023-09-22 08:45:21,201][24495] Num visible devices: 1 [2023-09-22 08:45:21,177][23569] Starting process rollout_proc7 [2023-09-22 08:45:21,298][24495] Starting seed is not provided [2023-09-22 08:45:21,298][24495] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-22 08:45:21,299][24495] Initializing actor-critic model on device cuda:0 [2023-09-22 08:45:21,299][24495] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 08:45:21,300][24495] RunningMeanStd input shape: (1,) [2023-09-22 08:45:21,336][24495] ConvEncoder: input_channels=4 [2023-09-22 08:45:21,742][24495] Conv encoder output size: 512 [2023-09-22 08:45:21,744][24495] Created Actor Critic model with architecture: [2023-09-22 08:45:21,745][24495] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-22 08:45:22,395][24495] Using optimizer [2023-09-22 08:45:22,396][24495] No checkpoints found [2023-09-22 08:45:22,396][24495] Did not load from checkpoint, starting from scratch! [2023-09-22 08:45:22,397][24495] Initialized policy 1 weights for model version 0 [2023-09-22 08:45:22,398][24495] LearnerWorker_p1 finished initialization! [2023-09-22 08:45:22,398][24495] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-22 08:45:23,161][24648] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-22 08:45:23,161][24648] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-09-22 08:45:23,177][24654] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-22 08:45:23,179][24648] Num visible devices: 1 [2023-09-22 08:45:23,190][24647] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-22 08:45:23,190][24647] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-22 08:45:23,191][24650] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-22 08:45:23,227][24656] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-22 08:45:23,248][24647] Num visible devices: 1 [2023-09-22 08:45:23,307][24655] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-22 08:45:23,308][24651] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-22 08:45:23,330][24649] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-22 08:45:23,374][24652] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-22 08:45:23,398][24653] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-22 08:45:23,542][23569] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 90112. Throughput: 0: nan, 1: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-22 08:45:23,791][24648] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 08:45:23,791][24648] RunningMeanStd input shape: (1,) [2023-09-22 08:45:23,803][24648] ConvEncoder: input_channels=4 [2023-09-22 08:45:23,828][24647] RunningMeanStd input shape: (4, 84, 84) [2023-09-22 08:45:23,829][24647] RunningMeanStd input shape: (1,) [2023-09-22 08:45:23,840][24647] ConvEncoder: input_channels=4 [2023-09-22 08:45:23,905][24648] Conv encoder output size: 512 [2023-09-22 08:45:23,911][23569] Inference worker 1-0 is ready! [2023-09-22 08:45:23,940][24647] Conv encoder output size: 512 [2023-09-22 08:45:23,946][23569] Inference worker 0-0 is ready! [2023-09-22 08:45:23,947][23569] All inference workers are ready! Signal rollout workers to start! [2023-09-22 08:45:24,439][24653] Decorrelating experience for 0 frames... [2023-09-22 08:45:24,440][24650] Decorrelating experience for 0 frames... [2023-09-22 08:45:24,443][24655] Decorrelating experience for 0 frames... [2023-09-22 08:45:24,443][24649] Decorrelating experience for 0 frames... [2023-09-22 08:45:24,443][24651] Decorrelating experience for 0 frames... [2023-09-22 08:45:24,444][24654] Decorrelating experience for 0 frames... [2023-09-22 08:45:24,445][24656] Decorrelating experience for 0 frames... [2023-09-22 08:45:24,448][24652] Decorrelating experience for 0 frames... [2023-09-22 08:45:28,542][23569] Fps is (10 sec: 1638.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 98304. Throughput: 0: 204.8, 1: 204.8. Samples: 2048. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:45:28,543][23569] Avg episode reward: [(0, '13.000'), (1, '8.000')] [2023-09-22 08:45:33,542][23569] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 114688. Throughput: 0: 345.0, 1: 340.5. Samples: 6855. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:45:33,543][23569] Avg episode reward: [(0, '10.333'), (1, '7.467')] [2023-09-22 08:45:33,543][24306] Saving new best policy, reward=10.333! [2023-09-22 08:45:37,720][23569] Heartbeat connected on Batcher_0 [2023-09-22 08:45:37,723][23569] Heartbeat connected on LearnerWorker_p0 [2023-09-22 08:45:37,726][23569] Heartbeat connected on Batcher_1 [2023-09-22 08:45:37,729][23569] Heartbeat connected on LearnerWorker_p1 [2023-09-22 08:45:37,734][23569] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-22 08:45:37,738][23569] Heartbeat connected on InferenceWorker_p1-w0 [2023-09-22 08:45:37,741][23569] Heartbeat connected on RolloutWorker_w0 [2023-09-22 08:45:37,742][23569] Heartbeat connected on RolloutWorker_w1 [2023-09-22 08:45:37,746][23569] Heartbeat connected on RolloutWorker_w2 [2023-09-22 08:45:37,749][23569] Heartbeat connected on RolloutWorker_w3 [2023-09-22 08:45:37,752][23569] Heartbeat connected on RolloutWorker_w4 [2023-09-22 08:45:37,753][23569] Heartbeat connected on RolloutWorker_w5 [2023-09-22 08:45:37,756][23569] Heartbeat connected on RolloutWorker_w6 [2023-09-22 08:45:37,759][23569] Heartbeat connected on RolloutWorker_w7 [2023-09-22 08:45:38,542][23569] Fps is (10 sec: 4915.3, 60 sec: 3823.0, 300 sec: 3823.0). Total num frames: 147456. Throughput: 0: 383.3, 1: 378.3. Samples: 11425. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:45:38,542][23569] Avg episode reward: [(0, '11.032'), (1, '7.438')] [2023-09-22 08:45:38,543][24306] Saving new best policy, reward=11.032! [2023-09-22 08:45:42,435][24648] Updated weights for policy 1, policy_version 160 (0.0017) [2023-09-22 08:45:42,436][24647] Updated weights for policy 0, policy_version 504 (0.0017) [2023-09-22 08:45:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 172032. Throughput: 0: 502.4, 1: 501.5. Samples: 20077. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:45:43,543][23569] Avg episode reward: [(0, '10.265'), (1, '6.760')] [2023-09-22 08:45:48,542][23569] Fps is (10 sec: 5734.2, 60 sec: 4587.5, 300 sec: 4587.5). Total num frames: 204800. Throughput: 0: 573.4, 1: 573.4. Samples: 28672. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:45:48,543][23569] Avg episode reward: [(0, '10.046'), (1, '7.094')] [2023-09-22 08:45:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 4642.1, 300 sec: 4642.1). Total num frames: 229376. Throughput: 0: 547.0, 1: 546.3. Samples: 32798. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:45:53,543][23569] Avg episode reward: [(0, '10.241'), (1, '7.638')] [2023-09-22 08:45:56,679][24647] Updated weights for policy 0, policy_version 664 (0.0015) [2023-09-22 08:45:56,679][24648] Updated weights for policy 1, policy_version 320 (0.0017) [2023-09-22 08:45:58,542][23569] Fps is (10 sec: 5734.6, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 262144. Throughput: 0: 591.7, 1: 591.1. Samples: 41399. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:45:58,542][23569] Avg episode reward: [(0, '10.138'), (1, '7.862')] [2023-09-22 08:46:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 286720. Throughput: 0: 623.7, 1: 622.9. Samples: 49864. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:46:03,543][23569] Avg episode reward: [(0, '10.010'), (1, '7.740')] [2023-09-22 08:46:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5097.3, 300 sec: 5097.3). Total num frames: 319488. Throughput: 0: 604.6, 1: 603.6. Samples: 54371. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:46:08,543][23569] Avg episode reward: [(0, '10.180'), (1, '7.740')] [2023-09-22 08:46:08,543][24495] Saving new best policy, reward=7.740! [2023-09-22 08:46:10,843][24647] Updated weights for policy 0, policy_version 824 (0.0016) [2023-09-22 08:46:10,843][24648] Updated weights for policy 1, policy_version 480 (0.0017) [2023-09-22 08:46:13,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5079.1, 300 sec: 5079.1). Total num frames: 344064. Throughput: 0: 680.7, 1: 681.4. Samples: 63345. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:46:13,543][23569] Avg episode reward: [(0, '10.140'), (1, '8.210')] [2023-09-22 08:46:13,619][24495] Saving new best policy, reward=8.210! [2023-09-22 08:46:18,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5213.1, 300 sec: 5213.1). Total num frames: 376832. Throughput: 0: 722.7, 1: 722.9. Samples: 71907. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 08:46:18,544][23569] Avg episode reward: [(0, '10.190'), (1, '7.960')] [2023-09-22 08:46:23,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5324.8, 300 sec: 5324.8). Total num frames: 409600. Throughput: 0: 721.1, 1: 722.1. Samples: 76370. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:46:23,543][23569] Avg episode reward: [(0, '9.890'), (1, '8.210')] [2023-09-22 08:46:24,761][24647] Updated weights for policy 0, policy_version 984 (0.0018) [2023-09-22 08:46:24,761][24648] Updated weights for policy 1, policy_version 640 (0.0019) [2023-09-22 08:46:28,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5597.9, 300 sec: 5293.3). Total num frames: 434176. Throughput: 0: 725.9, 1: 725.1. Samples: 85371. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 08:46:28,543][23569] Avg episode reward: [(0, '10.180'), (1, '8.400')] [2023-09-22 08:46:28,548][24495] Saving new best policy, reward=8.400! [2023-09-22 08:46:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5383.3). Total num frames: 466944. Throughput: 0: 728.2, 1: 728.2. Samples: 94208. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:46:33,543][23569] Avg episode reward: [(0, '10.030'), (1, '8.890')] [2023-09-22 08:46:33,544][24495] Saving new best policy, reward=8.890! [2023-09-22 08:46:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5352.1). Total num frames: 491520. Throughput: 0: 727.9, 1: 728.2. Samples: 98319. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:46:38,542][23569] Avg episode reward: [(0, '9.910'), (1, '8.990')] [2023-09-22 08:46:38,737][24495] Saving new best policy, reward=8.990! [2023-09-22 08:46:38,739][24647] Updated weights for policy 0, policy_version 1144 (0.0013) [2023-09-22 08:46:38,739][24648] Updated weights for policy 1, policy_version 800 (0.0018) [2023-09-22 08:46:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5427.2). Total num frames: 524288. Throughput: 0: 730.1, 1: 729.3. Samples: 107071. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:46:43,542][23569] Avg episode reward: [(0, '9.990'), (1, '9.460')] [2023-09-22 08:46:43,547][24495] Saving new best policy, reward=9.460! [2023-09-22 08:46:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5397.1). Total num frames: 548864. Throughput: 0: 729.3, 1: 729.7. Samples: 115518. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:46:48,543][23569] Avg episode reward: [(0, '9.920'), (1, '10.090')] [2023-09-22 08:46:48,545][24495] Saving new best policy, reward=10.090! [2023-09-22 08:46:52,968][24648] Updated weights for policy 1, policy_version 960 (0.0016) [2023-09-22 08:46:52,968][24647] Updated weights for policy 0, policy_version 1304 (0.0016) [2023-09-22 08:46:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5461.3). Total num frames: 581632. Throughput: 0: 729.3, 1: 729.7. Samples: 120026. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:46:53,543][23569] Avg episode reward: [(0, '9.860'), (1, '10.610')] [2023-09-22 08:46:53,544][24495] Saving new best policy, reward=10.610! [2023-09-22 08:46:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5432.6). Total num frames: 606208. Throughput: 0: 730.1, 1: 728.9. Samples: 128999. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:46:58,544][23569] Avg episode reward: [(0, '9.960'), (1, '11.330')] [2023-09-22 08:46:58,676][24495] Saving new best policy, reward=11.330! [2023-09-22 08:47:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5488.6). Total num frames: 638976. Throughput: 0: 725.3, 1: 726.0. Samples: 137216. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:47:03,543][23569] Avg episode reward: [(0, '10.200'), (1, '11.460')] [2023-09-22 08:47:03,543][24495] Saving new best policy, reward=11.460! [2023-09-22 08:47:07,137][24647] Updated weights for policy 0, policy_version 1464 (0.0017) [2023-09-22 08:47:07,137][24648] Updated weights for policy 1, policy_version 1120 (0.0017) [2023-09-22 08:47:08,542][23569] Fps is (10 sec: 6553.8, 60 sec: 5870.9, 300 sec: 5539.4). Total num frames: 671744. Throughput: 0: 724.6, 1: 724.7. Samples: 141585. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 08:47:08,543][23569] Avg episode reward: [(0, '10.060'), (1, '11.570')] [2023-09-22 08:47:08,545][24495] Saving new best policy, reward=11.570! [2023-09-22 08:47:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5511.0). Total num frames: 696320. Throughput: 0: 720.7, 1: 721.4. Samples: 150266. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:47:13,543][23569] Avg episode reward: [(0, '10.100'), (1, '11.410')] [2023-09-22 08:47:13,553][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000001184_303104.pth... [2023-09-22 08:47:13,553][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000001528_393216.pth... [2023-09-22 08:47:18,542][23569] Fps is (10 sec: 4915.1, 60 sec: 5734.4, 300 sec: 5485.1). Total num frames: 720896. Throughput: 0: 713.4, 1: 713.4. Samples: 158418. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:47:18,544][23569] Avg episode reward: [(0, '9.900'), (1, '11.120')] [2023-09-22 08:47:21,764][24647] Updated weights for policy 0, policy_version 1624 (0.0017) [2023-09-22 08:47:21,764][24648] Updated weights for policy 1, policy_version 1280 (0.0020) [2023-09-22 08:47:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5529.6). Total num frames: 753664. Throughput: 0: 716.7, 1: 716.8. Samples: 162824. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 08:47:23,543][23569] Avg episode reward: [(0, '9.870'), (1, '11.140')] [2023-09-22 08:47:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5505.0). Total num frames: 778240. Throughput: 0: 720.1, 1: 720.8. Samples: 171911. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:47:28,543][23569] Avg episode reward: [(0, '9.750'), (1, '10.700')] [2023-09-22 08:47:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5545.4). Total num frames: 811008. Throughput: 0: 723.0, 1: 722.7. Samples: 180576. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 08:47:33,542][23569] Avg episode reward: [(0, '9.830'), (1, '10.770')] [2023-09-22 08:47:35,447][24647] Updated weights for policy 0, policy_version 1784 (0.0014) [2023-09-22 08:47:35,448][24648] Updated weights for policy 1, policy_version 1440 (0.0016) [2023-09-22 08:47:38,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5870.9, 300 sec: 5582.7). Total num frames: 843776. Throughput: 0: 724.2, 1: 724.0. Samples: 185194. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 08:47:38,543][23569] Avg episode reward: [(0, '10.340'), (1, '10.900')] [2023-09-22 08:47:43,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5734.4, 300 sec: 5558.9). Total num frames: 868352. Throughput: 0: 723.6, 1: 724.3. Samples: 194158. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 08:47:43,543][23569] Avg episode reward: [(0, '10.600'), (1, '11.130')] [2023-09-22 08:47:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5593.2). Total num frames: 901120. Throughput: 0: 728.2, 1: 728.2. Samples: 202752. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:47:48,543][23569] Avg episode reward: [(0, '11.460'), (1, '11.370')] [2023-09-22 08:47:48,544][24306] Saving new best policy, reward=11.460! [2023-09-22 08:47:49,615][24647] Updated weights for policy 0, policy_version 1944 (0.0016) [2023-09-22 08:47:49,616][24648] Updated weights for policy 1, policy_version 1600 (0.0017) [2023-09-22 08:47:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5570.6). Total num frames: 925696. Throughput: 0: 724.8, 1: 725.4. Samples: 206848. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:47:53,543][23569] Avg episode reward: [(0, '11.830'), (1, '11.390')] [2023-09-22 08:47:53,543][24306] Saving new best policy, reward=11.830! [2023-09-22 08:47:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5602.3). Total num frames: 958464. Throughput: 0: 723.6, 1: 723.6. Samples: 215387. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 08:47:58,543][23569] Avg episode reward: [(0, '12.170'), (1, '11.670')] [2023-09-22 08:47:58,551][24306] Saving new best policy, reward=12.170! [2023-09-22 08:47:58,551][24495] Saving new best policy, reward=11.670! [2023-09-22 08:48:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5580.8). Total num frames: 983040. Throughput: 0: 730.8, 1: 730.7. Samples: 224184. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:48:03,543][23569] Avg episode reward: [(0, '11.880'), (1, '11.930')] [2023-09-22 08:48:03,715][24495] Saving new best policy, reward=11.930! [2023-09-22 08:48:03,752][24648] Updated weights for policy 1, policy_version 1760 (0.0014) [2023-09-22 08:48:03,754][24647] Updated weights for policy 0, policy_version 2104 (0.0018) [2023-09-22 08:48:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5610.3). Total num frames: 1015808. Throughput: 0: 733.5, 1: 733.6. Samples: 228842. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:48:08,543][23569] Avg episode reward: [(0, '12.040'), (1, '11.980')] [2023-09-22 08:48:08,544][24495] Saving new best policy, reward=11.980! [2023-09-22 08:48:13,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5870.9, 300 sec: 5638.0). Total num frames: 1048576. Throughput: 0: 729.3, 1: 729.8. Samples: 237568. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:48:13,543][23569] Avg episode reward: [(0, '11.860'), (1, '12.180')] [2023-09-22 08:48:13,552][24495] Saving new best policy, reward=12.180! [2023-09-22 08:48:17,545][24648] Updated weights for policy 1, policy_version 1920 (0.0016) [2023-09-22 08:48:17,546][24647] Updated weights for policy 0, policy_version 2264 (0.0016) [2023-09-22 08:48:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5617.4). Total num frames: 1073152. Throughput: 0: 731.6, 1: 731.9. Samples: 246437. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 08:48:18,542][23569] Avg episode reward: [(0, '11.710'), (1, '11.690')] [2023-09-22 08:48:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5643.4). Total num frames: 1105920. Throughput: 0: 729.5, 1: 729.3. Samples: 250843. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:48:23,543][23569] Avg episode reward: [(0, '11.800'), (1, '12.300')] [2023-09-22 08:48:23,545][24495] Saving new best policy, reward=12.300! [2023-09-22 08:48:28,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5623.7). Total num frames: 1130496. Throughput: 0: 730.8, 1: 730.3. Samples: 259909. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:48:28,543][23569] Avg episode reward: [(0, '11.660'), (1, '11.960')] [2023-09-22 08:48:31,565][24647] Updated weights for policy 0, policy_version 2424 (0.0019) [2023-09-22 08:48:31,566][24648] Updated weights for policy 1, policy_version 2080 (0.0016) [2023-09-22 08:48:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5648.2). Total num frames: 1163264. Throughput: 0: 728.2, 1: 728.2. Samples: 268288. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 08:48:33,543][23569] Avg episode reward: [(0, '11.840'), (1, '12.300')] [2023-09-22 08:48:38,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5802.7, 300 sec: 5650.4). Total num frames: 1191936. Throughput: 0: 731.2, 1: 730.3. Samples: 272615. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:48:38,543][23569] Avg episode reward: [(0, '11.890'), (1, '12.390')] [2023-09-22 08:48:38,544][24495] Saving new best policy, reward=12.390! [2023-09-22 08:48:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5652.5). Total num frames: 1220608. Throughput: 0: 735.4, 1: 735.3. Samples: 281570. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:48:43,543][23569] Avg episode reward: [(0, '11.820'), (1, '12.670')] [2023-09-22 08:48:43,552][24495] Saving new best policy, reward=12.670! [2023-09-22 08:48:45,462][24647] Updated weights for policy 0, policy_version 2584 (0.0014) [2023-09-22 08:48:45,463][24648] Updated weights for policy 1, policy_version 2240 (0.0015) [2023-09-22 08:48:48,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5870.9, 300 sec: 5674.5). Total num frames: 1253376. Throughput: 0: 737.5, 1: 738.3. Samples: 290594. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:48:48,542][23569] Avg episode reward: [(0, '12.290'), (1, '12.940')] [2023-09-22 08:48:48,543][24495] Saving new best policy, reward=12.940! [2023-09-22 08:48:48,543][24306] Saving new best policy, reward=12.290! [2023-09-22 08:48:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5656.4). Total num frames: 1277952. Throughput: 0: 734.2, 1: 734.1. Samples: 294912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:48:53,542][23569] Avg episode reward: [(0, '11.950'), (1, '12.830')] [2023-09-22 08:48:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5677.3). Total num frames: 1310720. Throughput: 0: 730.4, 1: 729.6. Samples: 303268. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:48:58,542][23569] Avg episode reward: [(0, '12.370'), (1, '12.890')] [2023-09-22 08:48:58,549][24306] Saving new best policy, reward=12.370! [2023-09-22 08:48:59,593][24647] Updated weights for policy 0, policy_version 2744 (0.0015) [2023-09-22 08:48:59,594][24648] Updated weights for policy 1, policy_version 2400 (0.0019) [2023-09-22 08:49:03,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5659.9). Total num frames: 1335296. Throughput: 0: 730.1, 1: 729.5. Samples: 312117. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 08:49:03,543][23569] Avg episode reward: [(0, '12.370'), (1, '13.330')] [2023-09-22 08:49:03,545][24495] Saving new best policy, reward=13.330! [2023-09-22 08:49:08,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5679.8). Total num frames: 1368064. Throughput: 0: 732.4, 1: 732.2. Samples: 316751. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 08:49:08,543][23569] Avg episode reward: [(0, '12.060'), (1, '13.190')] [2023-09-22 08:49:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5663.2). Total num frames: 1392640. Throughput: 0: 726.2, 1: 727.1. Samples: 325305. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:49:13,543][23569] Avg episode reward: [(0, '12.330'), (1, '12.950')] [2023-09-22 08:49:13,553][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000002544_651264.pth... [2023-09-22 08:49:13,553][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000002888_741376.pth... [2023-09-22 08:49:13,588][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000000344_90112.pth [2023-09-22 08:49:13,800][24648] Updated weights for policy 1, policy_version 2560 (0.0016) [2023-09-22 08:49:13,800][24647] Updated weights for policy 0, policy_version 2904 (0.0018) [2023-09-22 08:49:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5682.1). Total num frames: 1425408. Throughput: 0: 728.2, 1: 728.2. Samples: 333824. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:49:18,543][23569] Avg episode reward: [(0, '12.540'), (1, '13.070')] [2023-09-22 08:49:18,544][24306] Saving new best policy, reward=12.540! [2023-09-22 08:49:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5666.1). Total num frames: 1449984. Throughput: 0: 728.5, 1: 728.8. Samples: 338192. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:49:23,543][23569] Avg episode reward: [(0, '12.600'), (1, '13.140')] [2023-09-22 08:49:23,654][24306] Saving new best policy, reward=12.600! [2023-09-22 08:49:27,892][24648] Updated weights for policy 1, policy_version 2720 (0.0018) [2023-09-22 08:49:27,892][24647] Updated weights for policy 0, policy_version 3064 (0.0017) [2023-09-22 08:49:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5684.2). Total num frames: 1482752. Throughput: 0: 725.6, 1: 725.6. Samples: 346871. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 08:49:28,543][23569] Avg episode reward: [(0, '12.810'), (1, '13.440')] [2023-09-22 08:49:28,552][24306] Saving new best policy, reward=12.810! [2023-09-22 08:49:28,552][24495] Saving new best policy, reward=13.440! [2023-09-22 08:49:33,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5802.7, 300 sec: 5685.3). Total num frames: 1511424. Throughput: 0: 723.2, 1: 722.3. Samples: 355639. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:49:33,543][23569] Avg episode reward: [(0, '13.190'), (1, '13.690')] [2023-09-22 08:49:33,544][24495] Saving new best policy, reward=13.690! [2023-09-22 08:49:33,596][24306] Saving new best policy, reward=13.190! [2023-09-22 08:49:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5802.7, 300 sec: 5686.2). Total num frames: 1540096. Throughput: 0: 724.7, 1: 723.9. Samples: 360097. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 08:49:38,543][23569] Avg episode reward: [(0, '12.960'), (1, '13.960')] [2023-09-22 08:49:38,544][24495] Saving new best policy, reward=13.960! [2023-09-22 08:49:42,034][24648] Updated weights for policy 1, policy_version 2880 (0.0016) [2023-09-22 08:49:42,034][24647] Updated weights for policy 0, policy_version 3224 (0.0017) [2023-09-22 08:49:43,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5870.9, 300 sec: 5702.9). Total num frames: 1572864. Throughput: 0: 725.9, 1: 726.8. Samples: 368640. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:49:43,543][23569] Avg episode reward: [(0, '13.220'), (1, '14.300')] [2023-09-22 08:49:43,551][24306] Saving new best policy, reward=13.220! [2023-09-22 08:49:43,551][24495] Saving new best policy, reward=14.300! [2023-09-22 08:49:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5688.0). Total num frames: 1597440. Throughput: 0: 724.6, 1: 726.2. Samples: 377405. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:49:48,543][23569] Avg episode reward: [(0, '13.430'), (1, '14.500')] [2023-09-22 08:49:48,543][24306] Saving new best policy, reward=13.430! [2023-09-22 08:49:48,543][24495] Saving new best policy, reward=14.500! [2023-09-22 08:49:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5704.1). Total num frames: 1630208. Throughput: 0: 722.2, 1: 723.1. Samples: 381789. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:49:53,543][23569] Avg episode reward: [(0, '13.040'), (1, '14.080')] [2023-09-22 08:49:56,097][24648] Updated weights for policy 1, policy_version 3040 (0.0017) [2023-09-22 08:49:56,097][24647] Updated weights for policy 0, policy_version 3384 (0.0016) [2023-09-22 08:49:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5689.7). Total num frames: 1654784. Throughput: 0: 727.3, 1: 725.9. Samples: 390700. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:49:58,543][23569] Avg episode reward: [(0, '12.710'), (1, '13.770')] [2023-09-22 08:50:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5705.1). Total num frames: 1687552. Throughput: 0: 729.4, 1: 728.7. Samples: 399438. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:03,543][23569] Avg episode reward: [(0, '13.030'), (1, '14.140')] [2023-09-22 08:50:08,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5802.7, 300 sec: 5705.7). Total num frames: 1716224. Throughput: 0: 728.7, 1: 728.5. Samples: 403767. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:08,543][23569] Avg episode reward: [(0, '13.230'), (1, '14.080')] [2023-09-22 08:50:09,988][24647] Updated weights for policy 0, policy_version 3544 (0.0016) [2023-09-22 08:50:09,989][24648] Updated weights for policy 1, policy_version 3200 (0.0017) [2023-09-22 08:50:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5706.2). Total num frames: 1744896. Throughput: 0: 731.8, 1: 731.5. Samples: 412722. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:13,543][23569] Avg episode reward: [(0, '12.990'), (1, '13.990')] [2023-09-22 08:50:18,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5870.9, 300 sec: 5720.5). Total num frames: 1777664. Throughput: 0: 736.0, 1: 736.2. Samples: 421888. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:18,543][23569] Avg episode reward: [(0, '13.010'), (1, '13.450')] [2023-09-22 08:50:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5776.1). Total num frames: 1802240. Throughput: 0: 731.8, 1: 732.6. Samples: 425993. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:23,543][23569] Avg episode reward: [(0, '13.460'), (1, '13.990')] [2023-09-22 08:50:23,647][24306] Saving new best policy, reward=13.460! [2023-09-22 08:50:23,668][24647] Updated weights for policy 0, policy_version 3704 (0.0017) [2023-09-22 08:50:23,668][24648] Updated weights for policy 1, policy_version 3360 (0.0016) [2023-09-22 08:50:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 1835008. Throughput: 0: 735.2, 1: 734.1. Samples: 434762. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:28,543][23569] Avg episode reward: [(0, '13.390'), (1, '13.770')] [2023-09-22 08:50:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5802.7, 300 sec: 5803.8). Total num frames: 1859584. Throughput: 0: 736.1, 1: 735.8. Samples: 443642. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 08:50:33,543][23569] Avg episode reward: [(0, '13.680'), (1, '13.560')] [2023-09-22 08:50:33,625][24306] Saving new best policy, reward=13.680! [2023-09-22 08:50:37,781][24648] Updated weights for policy 1, policy_version 3520 (0.0018) [2023-09-22 08:50:37,781][24647] Updated weights for policy 0, policy_version 3864 (0.0017) [2023-09-22 08:50:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 1892352. Throughput: 0: 737.0, 1: 736.9. Samples: 448117. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:38,543][23569] Avg episode reward: [(0, '14.160'), (1, '13.560')] [2023-09-22 08:50:38,543][24306] Saving new best policy, reward=14.160! [2023-09-22 08:50:43,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 1925120. Throughput: 0: 734.0, 1: 734.1. Samples: 456763. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:43,543][23569] Avg episode reward: [(0, '14.430'), (1, '13.600')] [2023-09-22 08:50:43,552][24306] Saving new best policy, reward=14.430! [2023-09-22 08:50:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 1949696. Throughput: 0: 733.2, 1: 733.4. Samples: 465435. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:50:48,543][23569] Avg episode reward: [(0, '14.330'), (1, '13.800')] [2023-09-22 08:50:51,955][24648] Updated weights for policy 1, policy_version 3680 (0.0016) [2023-09-22 08:50:51,955][24647] Updated weights for policy 0, policy_version 4024 (0.0017) [2023-09-22 08:50:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 1982464. Throughput: 0: 732.4, 1: 733.7. Samples: 469743. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 08:50:53,543][23569] Avg episode reward: [(0, '14.900'), (1, '14.320')] [2023-09-22 08:50:53,544][24306] Saving new best policy, reward=14.900! [2023-09-22 08:50:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2007040. Throughput: 0: 734.4, 1: 733.6. Samples: 478783. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:50:58,543][23569] Avg episode reward: [(0, '15.280'), (1, '14.560')] [2023-09-22 08:50:58,554][24306] Saving new best policy, reward=15.280! [2023-09-22 08:50:58,554][24495] Saving new best policy, reward=14.560! [2023-09-22 08:51:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2039808. Throughput: 0: 728.4, 1: 728.3. Samples: 487439. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 08:51:03,543][23569] Avg episode reward: [(0, '15.490'), (1, '14.520')] [2023-09-22 08:51:03,545][24306] Saving new best policy, reward=15.490! [2023-09-22 08:51:05,754][24647] Updated weights for policy 0, policy_version 4184 (0.0017) [2023-09-22 08:51:05,754][24648] Updated weights for policy 1, policy_version 3840 (0.0017) [2023-09-22 08:51:08,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5870.9, 300 sec: 5845.5). Total num frames: 2068480. Throughput: 0: 732.0, 1: 731.3. Samples: 491838. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:51:08,543][23569] Avg episode reward: [(0, '15.820'), (1, '14.300')] [2023-09-22 08:51:08,543][24306] Saving new best policy, reward=15.820! [2023-09-22 08:51:13,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2097152. Throughput: 0: 729.5, 1: 730.0. Samples: 500438. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:51:13,543][23569] Avg episode reward: [(0, '15.850'), (1, '14.300')] [2023-09-22 08:51:13,551][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000003920_1003520.pth... [2023-09-22 08:51:13,552][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000004264_1093632.pth... [2023-09-22 08:51:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000001528_393216.pth [2023-09-22 08:51:13,591][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000001184_303104.pth [2023-09-22 08:51:13,594][24306] Saving new best policy, reward=15.850! [2023-09-22 08:51:18,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2129920. Throughput: 0: 730.1, 1: 729.4. Samples: 509320. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:51:18,543][23569] Avg episode reward: [(0, '15.610'), (1, '14.620')] [2023-09-22 08:51:18,545][24495] Saving new best policy, reward=14.620! [2023-09-22 08:51:19,909][24648] Updated weights for policy 1, policy_version 4000 (0.0015) [2023-09-22 08:51:19,910][24647] Updated weights for policy 0, policy_version 4344 (0.0014) [2023-09-22 08:51:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2154496. Throughput: 0: 730.8, 1: 730.2. Samples: 513863. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:51:23,543][23569] Avg episode reward: [(0, '15.940'), (1, '14.910')] [2023-09-22 08:51:23,545][24495] Saving new best policy, reward=14.910! [2023-09-22 08:51:23,545][24306] Saving new best policy, reward=15.940! [2023-09-22 08:51:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2187264. Throughput: 0: 727.2, 1: 727.8. Samples: 522241. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:51:28,542][23569] Avg episode reward: [(0, '15.770'), (1, '14.400')] [2023-09-22 08:51:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2211840. Throughput: 0: 726.4, 1: 726.4. Samples: 530810. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 08:51:33,543][23569] Avg episode reward: [(0, '15.640'), (1, '14.350')] [2023-09-22 08:51:34,087][24647] Updated weights for policy 0, policy_version 4504 (0.0017) [2023-09-22 08:51:34,088][24648] Updated weights for policy 1, policy_version 4160 (0.0019) [2023-09-22 08:51:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2244608. Throughput: 0: 728.2, 1: 727.1. Samples: 535233. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:51:38,543][23569] Avg episode reward: [(0, '15.580'), (1, '14.880')] [2023-09-22 08:51:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2269184. Throughput: 0: 723.8, 1: 725.1. Samples: 543982. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:51:43,543][23569] Avg episode reward: [(0, '15.860'), (1, '14.500')] [2023-09-22 08:51:48,299][24648] Updated weights for policy 1, policy_version 4320 (0.0018) [2023-09-22 08:51:48,299][24647] Updated weights for policy 0, policy_version 4664 (0.0016) [2023-09-22 08:51:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2301952. Throughput: 0: 725.8, 1: 727.3. Samples: 552830. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:51:48,543][23569] Avg episode reward: [(0, '16.590'), (1, '14.430')] [2023-09-22 08:51:48,544][24306] Saving new best policy, reward=16.590! [2023-09-22 08:51:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2326528. Throughput: 0: 724.3, 1: 725.0. Samples: 557056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:51:53,543][23569] Avg episode reward: [(0, '16.710'), (1, '14.130')] [2023-09-22 08:51:53,543][24306] Saving new best policy, reward=16.710! [2023-09-22 08:51:58,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 2359296. Throughput: 0: 727.0, 1: 725.9. Samples: 565820. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 08:51:58,542][23569] Avg episode reward: [(0, '17.090'), (1, '14.310')] [2023-09-22 08:51:58,549][24306] Saving new best policy, reward=17.090! [2023-09-22 08:52:02,075][24648] Updated weights for policy 1, policy_version 4480 (0.0019) [2023-09-22 08:52:02,075][24647] Updated weights for policy 0, policy_version 4824 (0.0016) [2023-09-22 08:52:03,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 2392064. Throughput: 0: 729.2, 1: 729.6. Samples: 574967. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:52:03,542][23569] Avg episode reward: [(0, '17.090'), (1, '14.100')] [2023-09-22 08:52:08,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5802.7, 300 sec: 5831.6). Total num frames: 2416640. Throughput: 0: 729.3, 1: 728.9. Samples: 579483. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:52:08,543][23569] Avg episode reward: [(0, '17.270'), (1, '14.530')] [2023-09-22 08:52:08,545][24306] Saving new best policy, reward=17.270! [2023-09-22 08:52:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 2449408. Throughput: 0: 728.4, 1: 728.3. Samples: 587791. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:52:13,543][23569] Avg episode reward: [(0, '17.430'), (1, '14.600')] [2023-09-22 08:52:13,553][24306] Saving new best policy, reward=17.430! [2023-09-22 08:52:16,254][24647] Updated weights for policy 0, policy_version 4984 (0.0016) [2023-09-22 08:52:16,255][24648] Updated weights for policy 1, policy_version 4640 (0.0016) [2023-09-22 08:52:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2473984. Throughput: 0: 727.0, 1: 726.6. Samples: 596221. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:52:18,542][23569] Avg episode reward: [(0, '17.460'), (1, '14.930')] [2023-09-22 08:52:18,543][24495] Saving new best policy, reward=14.930! [2023-09-22 08:52:18,543][24306] Saving new best policy, reward=17.460! [2023-09-22 08:52:23,542][23569] Fps is (10 sec: 5324.8, 60 sec: 5802.7, 300 sec: 5845.5). Total num frames: 2502656. Throughput: 0: 723.0, 1: 723.1. Samples: 600309. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:52:23,543][23569] Avg episode reward: [(0, '17.460'), (1, '14.710')] [2023-09-22 08:52:28,542][23569] Fps is (10 sec: 5734.1, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2531328. Throughput: 0: 723.2, 1: 723.6. Samples: 609084. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:52:28,544][23569] Avg episode reward: [(0, '17.850'), (1, '15.270')] [2023-09-22 08:52:28,552][24306] Saving new best policy, reward=17.850! [2023-09-22 08:52:28,553][24495] Saving new best policy, reward=15.270! [2023-09-22 08:52:31,064][24648] Updated weights for policy 1, policy_version 4800 (0.0015) [2023-09-22 08:52:31,065][24647] Updated weights for policy 0, policy_version 5144 (0.0013) [2023-09-22 08:52:33,542][23569] Fps is (10 sec: 5324.8, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 2555904. Throughput: 0: 716.4, 1: 715.0. Samples: 617241. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:52:33,543][23569] Avg episode reward: [(0, '18.260'), (1, '15.290')] [2023-09-22 08:52:33,544][24306] Saving new best policy, reward=18.260! [2023-09-22 08:52:33,545][24495] Saving new best policy, reward=15.290! [2023-09-22 08:52:38,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2588672. Throughput: 0: 718.6, 1: 718.3. Samples: 621719. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 08:52:38,543][23569] Avg episode reward: [(0, '18.610'), (1, '15.540')] [2023-09-22 08:52:38,543][24495] Saving new best policy, reward=15.540! [2023-09-22 08:52:38,544][24306] Saving new best policy, reward=18.610! [2023-09-22 08:52:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 2613248. Throughput: 0: 720.3, 1: 720.6. Samples: 630658. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 08:52:43,543][23569] Avg episode reward: [(0, '18.790'), (1, '15.780')] [2023-09-22 08:52:43,553][24306] Saving new best policy, reward=18.790! [2023-09-22 08:52:43,567][24495] Saving new best policy, reward=15.780! [2023-09-22 08:52:44,982][24648] Updated weights for policy 1, policy_version 4960 (0.0017) [2023-09-22 08:52:44,983][24647] Updated weights for policy 0, policy_version 5304 (0.0015) [2023-09-22 08:52:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2646016. Throughput: 0: 714.6, 1: 713.4. Samples: 639229. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:52:48,543][23569] Avg episode reward: [(0, '18.370'), (1, '15.800')] [2023-09-22 08:52:48,544][24495] Saving new best policy, reward=15.800! [2023-09-22 08:52:53,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2678784. Throughput: 0: 713.5, 1: 714.1. Samples: 643725. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 08:52:53,543][23569] Avg episode reward: [(0, '18.140'), (1, '15.470')] [2023-09-22 08:52:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2703360. Throughput: 0: 716.1, 1: 714.8. Samples: 652180. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:52:58,543][23569] Avg episode reward: [(0, '18.340'), (1, '15.250')] [2023-09-22 08:52:59,159][24647] Updated weights for policy 0, policy_version 5464 (0.0018) [2023-09-22 08:52:59,159][24648] Updated weights for policy 1, policy_version 5120 (0.0017) [2023-09-22 08:53:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2736128. Throughput: 0: 723.3, 1: 723.6. Samples: 661329. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 08:53:03,543][23569] Avg episode reward: [(0, '18.440'), (1, '15.270')] [2023-09-22 08:53:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 2760704. Throughput: 0: 725.2, 1: 725.7. Samples: 665600. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:53:08,543][23569] Avg episode reward: [(0, '18.700'), (1, '15.360')] [2023-09-22 08:53:13,155][24648] Updated weights for policy 1, policy_version 5280 (0.0016) [2023-09-22 08:53:13,156][24647] Updated weights for policy 0, policy_version 5624 (0.0016) [2023-09-22 08:53:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 2793472. Throughput: 0: 722.0, 1: 721.2. Samples: 674024. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:53:13,542][23569] Avg episode reward: [(0, '19.520'), (1, '15.070')] [2023-09-22 08:53:13,551][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000005280_1351680.pth... [2023-09-22 08:53:13,551][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000005624_1441792.pth... [2023-09-22 08:53:13,581][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000002888_741376.pth [2023-09-22 08:53:13,584][24306] Saving new best policy, reward=19.520! [2023-09-22 08:53:13,590][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000002544_651264.pth [2023-09-22 08:53:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 2818048. Throughput: 0: 731.6, 1: 730.9. Samples: 683056. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:53:18,543][23569] Avg episode reward: [(0, '19.950'), (1, '15.320')] [2023-09-22 08:53:18,647][24306] Saving new best policy, reward=19.950! [2023-09-22 08:53:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5802.7, 300 sec: 5831.6). Total num frames: 2850816. Throughput: 0: 729.8, 1: 730.0. Samples: 687410. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:53:23,543][23569] Avg episode reward: [(0, '19.780'), (1, '15.270')] [2023-09-22 08:53:27,411][24647] Updated weights for policy 0, policy_version 5784 (0.0017) [2023-09-22 08:53:27,411][24648] Updated weights for policy 1, policy_version 5440 (0.0018) [2023-09-22 08:53:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 2875392. Throughput: 0: 725.1, 1: 725.6. Samples: 695939. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 08:53:28,543][23569] Avg episode reward: [(0, '19.730'), (1, '15.390')] [2023-09-22 08:53:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5817.7). Total num frames: 2908160. Throughput: 0: 724.8, 1: 725.9. Samples: 704512. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:53:33,543][23569] Avg episode reward: [(0, '19.640'), (1, '14.820')] [2023-09-22 08:53:38,542][23569] Fps is (10 sec: 6553.8, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2940928. Throughput: 0: 726.1, 1: 726.6. Samples: 709097. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 08:53:38,542][23569] Avg episode reward: [(0, '19.890'), (1, '15.520')] [2023-09-22 08:53:41,286][24648] Updated weights for policy 1, policy_version 5600 (0.0016) [2023-09-22 08:53:41,286][24647] Updated weights for policy 0, policy_version 5944 (0.0015) [2023-09-22 08:53:43,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5803.8). Total num frames: 2965504. Throughput: 0: 729.7, 1: 731.1. Samples: 717915. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 08:53:43,543][23569] Avg episode reward: [(0, '19.590'), (1, '15.570')] [2023-09-22 08:53:48,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 2998272. Throughput: 0: 729.8, 1: 730.4. Samples: 727040. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:53:48,544][23569] Avg episode reward: [(0, '19.690'), (1, '15.870')] [2023-09-22 08:53:48,545][24495] Saving new best policy, reward=15.870! [2023-09-22 08:53:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 3022848. Throughput: 0: 728.3, 1: 728.2. Samples: 731142. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:53:53,543][23569] Avg episode reward: [(0, '19.890'), (1, '15.880')] [2023-09-22 08:53:53,545][24495] Saving new best policy, reward=15.880! [2023-09-22 08:53:55,225][24648] Updated weights for policy 1, policy_version 5760 (0.0017) [2023-09-22 08:53:55,226][24647] Updated weights for policy 0, policy_version 6104 (0.0018) [2023-09-22 08:53:58,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 3055616. Throughput: 0: 730.6, 1: 731.2. Samples: 739805. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:53:58,542][23569] Avg episode reward: [(0, '20.190'), (1, '15.940')] [2023-09-22 08:53:58,550][24306] Saving new best policy, reward=20.190! [2023-09-22 08:53:58,550][24495] Saving new best policy, reward=15.940! [2023-09-22 08:54:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 3080192. Throughput: 0: 726.2, 1: 726.6. Samples: 748430. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 08:54:03,543][23569] Avg episode reward: [(0, '20.070'), (1, '16.450')] [2023-09-22 08:54:03,723][24495] Saving new best policy, reward=16.450! [2023-09-22 08:54:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 3112960. Throughput: 0: 726.7, 1: 727.6. Samples: 752852. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 08:54:08,542][23569] Avg episode reward: [(0, '19.650'), (1, '16.380')] [2023-09-22 08:54:09,506][24647] Updated weights for policy 0, policy_version 6264 (0.0016) [2023-09-22 08:54:09,506][24648] Updated weights for policy 1, policy_version 5920 (0.0017) [2023-09-22 08:54:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 3137536. Throughput: 0: 731.9, 1: 732.3. Samples: 761831. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:54:13,543][23569] Avg episode reward: [(0, '19.110'), (1, '15.970')] [2023-09-22 08:54:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3170304. Throughput: 0: 733.7, 1: 733.3. Samples: 770526. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:54:18,543][23569] Avg episode reward: [(0, '19.450'), (1, '16.180')] [2023-09-22 08:54:23,323][24647] Updated weights for policy 0, policy_version 6424 (0.0018) [2023-09-22 08:54:23,325][24648] Updated weights for policy 1, policy_version 6080 (0.0018) [2023-09-22 08:54:23,542][23569] Fps is (10 sec: 6553.8, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 3203072. Throughput: 0: 732.9, 1: 730.8. Samples: 774963. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:54:23,542][23569] Avg episode reward: [(0, '19.480'), (1, '16.580')] [2023-09-22 08:54:23,543][24495] Saving new best policy, reward=16.580! [2023-09-22 08:54:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5817.7). Total num frames: 3227648. Throughput: 0: 733.9, 1: 731.1. Samples: 783841. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:54:28,542][23569] Avg episode reward: [(0, '19.180'), (1, '16.540')] [2023-09-22 08:54:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 3260416. Throughput: 0: 728.2, 1: 727.9. Samples: 792564. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:54:33,542][23569] Avg episode reward: [(0, '19.260'), (1, '16.730')] [2023-09-22 08:54:33,543][24495] Saving new best policy, reward=16.730! [2023-09-22 08:54:37,317][24647] Updated weights for policy 0, policy_version 6584 (0.0015) [2023-09-22 08:54:37,317][24648] Updated weights for policy 1, policy_version 6240 (0.0017) [2023-09-22 08:54:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 3284992. Throughput: 0: 728.2, 1: 728.2. Samples: 796677. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 08:54:38,543][23569] Avg episode reward: [(0, '19.840'), (1, '16.960')] [2023-09-22 08:54:38,640][24495] Saving new best policy, reward=16.960! [2023-09-22 08:54:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3317760. Throughput: 0: 730.7, 1: 730.9. Samples: 805576. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 08:54:43,542][23569] Avg episode reward: [(0, '19.750'), (1, '17.520')] [2023-09-22 08:54:43,550][24495] Saving new best policy, reward=17.520! [2023-09-22 08:54:48,542][23569] Fps is (10 sec: 6553.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3350528. Throughput: 0: 737.2, 1: 737.0. Samples: 814771. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:54:48,544][23569] Avg episode reward: [(0, '19.720'), (1, '17.690')] [2023-09-22 08:54:48,546][24495] Saving new best policy, reward=17.690! [2023-09-22 08:54:51,140][24648] Updated weights for policy 1, policy_version 6400 (0.0017) [2023-09-22 08:54:51,140][24647] Updated weights for policy 0, policy_version 6744 (0.0017) [2023-09-22 08:54:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3375104. Throughput: 0: 737.6, 1: 736.7. Samples: 819197. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:54:53,543][23569] Avg episode reward: [(0, '20.250'), (1, '17.230')] [2023-09-22 08:54:53,544][24306] Saving new best policy, reward=20.250! [2023-09-22 08:54:58,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3407872. Throughput: 0: 732.0, 1: 731.9. Samples: 827706. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:54:58,543][23569] Avg episode reward: [(0, '20.270'), (1, '17.090')] [2023-09-22 08:54:58,551][24306] Saving new best policy, reward=20.270! [2023-09-22 08:55:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5817.7). Total num frames: 3432448. Throughput: 0: 735.8, 1: 735.0. Samples: 836716. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:55:03,543][23569] Avg episode reward: [(0, '20.690'), (1, '16.840')] [2023-09-22 08:55:03,685][24306] Saving new best policy, reward=20.690! [2023-09-22 08:55:05,054][24648] Updated weights for policy 1, policy_version 6560 (0.0019) [2023-09-22 08:55:05,054][24647] Updated weights for policy 0, policy_version 6904 (0.0019) [2023-09-22 08:55:08,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3465216. Throughput: 0: 736.0, 1: 737.1. Samples: 841256. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 08:55:08,543][23569] Avg episode reward: [(0, '20.560'), (1, '17.070')] [2023-09-22 08:55:13,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5831.6). Total num frames: 3497984. Throughput: 0: 732.8, 1: 735.6. Samples: 849920. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:55:13,543][23569] Avg episode reward: [(0, '20.980'), (1, '16.820')] [2023-09-22 08:55:13,555][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000007000_1794048.pth... [2023-09-22 08:55:13,555][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000006656_1703936.pth... [2023-09-22 08:55:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000004264_1093632.pth [2023-09-22 08:55:13,591][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000003920_1003520.pth [2023-09-22 08:55:13,593][24306] Saving new best policy, reward=20.980! [2023-09-22 08:55:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3522560. Throughput: 0: 735.8, 1: 735.7. Samples: 858781. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:55:18,543][23569] Avg episode reward: [(0, '20.970'), (1, '16.640')] [2023-09-22 08:55:18,999][24648] Updated weights for policy 1, policy_version 6720 (0.0017) [2023-09-22 08:55:19,000][24647] Updated weights for policy 0, policy_version 7064 (0.0017) [2023-09-22 08:55:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3555328. Throughput: 0: 738.4, 1: 738.5. Samples: 863138. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 08:55:23,543][23569] Avg episode reward: [(0, '20.950'), (1, '16.430')] [2023-09-22 08:55:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3579904. Throughput: 0: 736.8, 1: 736.4. Samples: 871868. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:55:28,543][23569] Avg episode reward: [(0, '20.930'), (1, '16.770')] [2023-09-22 08:55:33,132][24647] Updated weights for policy 0, policy_version 7224 (0.0015) [2023-09-22 08:55:33,133][24648] Updated weights for policy 1, policy_version 6880 (0.0018) [2023-09-22 08:55:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3612672. Throughput: 0: 731.6, 1: 732.1. Samples: 880640. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:55:33,543][23569] Avg episode reward: [(0, '20.660'), (1, '16.520')] [2023-09-22 08:55:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5803.8). Total num frames: 3637248. Throughput: 0: 728.2, 1: 728.2. Samples: 884737. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:55:38,543][23569] Avg episode reward: [(0, '20.740'), (1, '16.600')] [2023-09-22 08:55:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3670016. Throughput: 0: 729.6, 1: 729.0. Samples: 893347. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:55:43,543][23569] Avg episode reward: [(0, '20.790'), (1, '16.710')] [2023-09-22 08:55:47,355][24647] Updated weights for policy 0, policy_version 7384 (0.0015) [2023-09-22 08:55:47,355][24648] Updated weights for policy 1, policy_version 7040 (0.0016) [2023-09-22 08:55:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 3694592. Throughput: 0: 727.5, 1: 728.1. Samples: 902218. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:55:48,543][23569] Avg episode reward: [(0, '20.530'), (1, '17.260')] [2023-09-22 08:55:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 3727360. Throughput: 0: 727.0, 1: 726.8. Samples: 906676. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:55:53,542][23569] Avg episode reward: [(0, '20.610'), (1, '16.980')] [2023-09-22 08:55:58,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3760128. Throughput: 0: 728.2, 1: 728.2. Samples: 915456. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:55:58,543][23569] Avg episode reward: [(0, '20.970'), (1, '16.440')] [2023-09-22 08:56:01,112][24647] Updated weights for policy 0, policy_version 7544 (0.0017) [2023-09-22 08:56:01,112][24648] Updated weights for policy 1, policy_version 7200 (0.0017) [2023-09-22 08:56:03,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5817.7). Total num frames: 3784704. Throughput: 0: 728.1, 1: 728.7. Samples: 924336. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:56:03,543][23569] Avg episode reward: [(0, '21.180'), (1, '16.920')] [2023-09-22 08:56:03,545][24306] Saving new best policy, reward=21.180! [2023-09-22 08:56:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3817472. Throughput: 0: 729.7, 1: 728.7. Samples: 928763. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:08,543][23569] Avg episode reward: [(0, '21.370'), (1, '16.930')] [2023-09-22 08:56:08,544][24306] Saving new best policy, reward=21.370! [2023-09-22 08:56:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5803.8). Total num frames: 3842048. Throughput: 0: 731.2, 1: 731.2. Samples: 937673. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:13,543][23569] Avg episode reward: [(0, '21.060'), (1, '17.260')] [2023-09-22 08:56:15,071][24647] Updated weights for policy 0, policy_version 7704 (0.0018) [2023-09-22 08:56:15,072][24648] Updated weights for policy 1, policy_version 7360 (0.0018) [2023-09-22 08:56:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3874816. Throughput: 0: 728.2, 1: 728.2. Samples: 946176. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 08:56:18,543][23569] Avg episode reward: [(0, '21.410'), (1, '17.080')] [2023-09-22 08:56:18,543][24306] Saving new best policy, reward=21.410! [2023-09-22 08:56:23,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5802.7, 300 sec: 5817.7). Total num frames: 3903488. Throughput: 0: 730.9, 1: 730.4. Samples: 950497. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:23,543][23569] Avg episode reward: [(0, '21.460'), (1, '17.880')] [2023-09-22 08:56:23,545][24495] Saving new best policy, reward=17.880! [2023-09-22 08:56:23,610][24306] Saving new best policy, reward=21.460! [2023-09-22 08:56:28,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3932160. Throughput: 0: 734.9, 1: 735.5. Samples: 959514. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:28,543][23569] Avg episode reward: [(0, '21.170'), (1, '18.390')] [2023-09-22 08:56:28,553][24495] Saving new best policy, reward=18.390! [2023-09-22 08:56:29,112][24648] Updated weights for policy 1, policy_version 7520 (0.0017) [2023-09-22 08:56:29,113][24647] Updated weights for policy 0, policy_version 7864 (0.0017) [2023-09-22 08:56:33,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3964928. Throughput: 0: 738.5, 1: 737.0. Samples: 968612. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:33,542][23569] Avg episode reward: [(0, '21.930'), (1, '17.530')] [2023-09-22 08:56:33,543][24306] Saving new best policy, reward=21.930! [2023-09-22 08:56:38,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 3989504. Throughput: 0: 734.1, 1: 735.4. Samples: 972806. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:38,543][23569] Avg episode reward: [(0, '21.780'), (1, '17.500')] [2023-09-22 08:56:43,055][24648] Updated weights for policy 1, policy_version 7680 (0.0019) [2023-09-22 08:56:43,055][24647] Updated weights for policy 0, policy_version 8024 (0.0018) [2023-09-22 08:56:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4022272. Throughput: 0: 735.1, 1: 734.3. Samples: 981581. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:43,543][23569] Avg episode reward: [(0, '21.940'), (1, '17.710')] [2023-09-22 08:56:43,552][24306] Saving new best policy, reward=21.940! [2023-09-22 08:56:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4046848. Throughput: 0: 734.2, 1: 732.8. Samples: 990353. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:48,543][23569] Avg episode reward: [(0, '21.680'), (1, '17.850')] [2023-09-22 08:56:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4079616. Throughput: 0: 734.6, 1: 734.2. Samples: 994862. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:53,544][23569] Avg episode reward: [(0, '21.710'), (1, '17.940')] [2023-09-22 08:56:56,919][24648] Updated weights for policy 1, policy_version 7840 (0.0017) [2023-09-22 08:56:56,920][24647] Updated weights for policy 0, policy_version 8184 (0.0016) [2023-09-22 08:56:58,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4112384. Throughput: 0: 731.8, 1: 732.0. Samples: 1003545. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:56:58,543][23569] Avg episode reward: [(0, '21.630'), (1, '18.350')] [2023-09-22 08:57:03,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4136960. Throughput: 0: 735.0, 1: 733.3. Samples: 1012249. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:57:03,543][23569] Avg episode reward: [(0, '21.520'), (1, '18.490')] [2023-09-22 08:57:03,544][24495] Saving new best policy, reward=18.490! [2023-09-22 08:57:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4169728. Throughput: 0: 736.8, 1: 735.3. Samples: 1016742. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 08:57:08,543][23569] Avg episode reward: [(0, '21.870'), (1, '18.210')] [2023-09-22 08:57:11,016][24647] Updated weights for policy 0, policy_version 8344 (0.0016) [2023-09-22 08:57:11,017][24648] Updated weights for policy 1, policy_version 8000 (0.0018) [2023-09-22 08:57:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4194304. Throughput: 0: 734.1, 1: 734.4. Samples: 1025598. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:57:13,543][23569] Avg episode reward: [(0, '21.940'), (1, '17.780')] [2023-09-22 08:57:13,554][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000008016_2052096.pth... [2023-09-22 08:57:13,583][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000005280_1351680.pth [2023-09-22 08:57:13,737][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000008376_2146304.pth... [2023-09-22 08:57:13,764][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000005624_1441792.pth [2023-09-22 08:57:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5845.5). Total num frames: 4227072. Throughput: 0: 728.2, 1: 730.2. Samples: 1034240. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 08:57:18,543][23569] Avg episode reward: [(0, '21.820'), (1, '17.490')] [2023-09-22 08:57:23,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5939.2, 300 sec: 5859.4). Total num frames: 4259840. Throughput: 0: 732.0, 1: 731.6. Samples: 1038668. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 08:57:23,543][23569] Avg episode reward: [(0, '22.300'), (1, '17.430')] [2023-09-22 08:57:23,545][24306] Saving new best policy, reward=22.300! [2023-09-22 08:57:24,950][24648] Updated weights for policy 1, policy_version 8160 (0.0019) [2023-09-22 08:57:24,950][24647] Updated weights for policy 0, policy_version 8504 (0.0018) [2023-09-22 08:57:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 4284416. Throughput: 0: 735.2, 1: 734.8. Samples: 1047734. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 08:57:28,542][23569] Avg episode reward: [(0, '22.420'), (1, '17.480')] [2023-09-22 08:57:28,550][24306] Saving new best policy, reward=22.420! [2023-09-22 08:57:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4317184. Throughput: 0: 737.2, 1: 737.2. Samples: 1056699. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:57:33,543][23569] Avg episode reward: [(0, '22.060'), (1, '18.050')] [2023-09-22 08:57:38,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4341760. Throughput: 0: 732.7, 1: 734.0. Samples: 1060864. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:57:38,543][23569] Avg episode reward: [(0, '21.400'), (1, '17.990')] [2023-09-22 08:57:38,961][24647] Updated weights for policy 0, policy_version 8664 (0.0015) [2023-09-22 08:57:38,961][24648] Updated weights for policy 1, policy_version 8320 (0.0015) [2023-09-22 08:57:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4374528. Throughput: 0: 727.7, 1: 728.1. Samples: 1069056. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:57:43,543][23569] Avg episode reward: [(0, '21.560'), (1, '18.080')] [2023-09-22 08:57:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 4399104. Throughput: 0: 728.0, 1: 729.6. Samples: 1077840. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 08:57:48,543][23569] Avg episode reward: [(0, '21.540'), (1, '18.000')] [2023-09-22 08:57:53,337][24648] Updated weights for policy 1, policy_version 8480 (0.0017) [2023-09-22 08:57:53,337][24647] Updated weights for policy 0, policy_version 8824 (0.0013) [2023-09-22 08:57:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4431872. Throughput: 0: 726.0, 1: 726.2. Samples: 1082092. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:57:53,543][23569] Avg episode reward: [(0, '21.030'), (1, '17.840')] [2023-09-22 08:57:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 4456448. Throughput: 0: 723.8, 1: 724.6. Samples: 1090776. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 08:57:58,543][23569] Avg episode reward: [(0, '20.810'), (1, '17.630')] [2023-09-22 08:58:03,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4489216. Throughput: 0: 728.2, 1: 728.2. Samples: 1099775. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 08:58:03,543][23569] Avg episode reward: [(0, '21.150'), (1, '18.070')] [2023-09-22 08:58:07,322][24647] Updated weights for policy 0, policy_version 8984 (0.0016) [2023-09-22 08:58:07,322][24648] Updated weights for policy 1, policy_version 8640 (0.0018) [2023-09-22 08:58:08,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 4513792. Throughput: 0: 725.4, 1: 725.2. Samples: 1103942. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 08:58:08,543][23569] Avg episode reward: [(0, '21.520'), (1, '17.840')] [2023-09-22 08:58:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 4546560. Throughput: 0: 722.1, 1: 722.5. Samples: 1112740. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:13,542][23569] Avg episode reward: [(0, '21.190'), (1, '18.070')] [2023-09-22 08:58:18,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4579328. Throughput: 0: 724.7, 1: 724.5. Samples: 1121915. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:18,543][23569] Avg episode reward: [(0, '21.280'), (1, '18.550')] [2023-09-22 08:58:18,545][24495] Saving new best policy, reward=18.550! [2023-09-22 08:58:21,314][24647] Updated weights for policy 0, policy_version 9144 (0.0018) [2023-09-22 08:58:21,314][24648] Updated weights for policy 1, policy_version 8800 (0.0018) [2023-09-22 08:58:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 4603904. Throughput: 0: 726.3, 1: 725.9. Samples: 1126212. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:23,543][23569] Avg episode reward: [(0, '21.750'), (1, '17.890')] [2023-09-22 08:58:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4636672. Throughput: 0: 728.3, 1: 728.2. Samples: 1134597. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 08:58:28,543][23569] Avg episode reward: [(0, '21.900'), (1, '17.880')] [2023-09-22 08:58:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 4661248. Throughput: 0: 726.2, 1: 726.6. Samples: 1143216. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 08:58:33,543][23569] Avg episode reward: [(0, '22.200'), (1, '18.030')] [2023-09-22 08:58:35,397][24647] Updated weights for policy 0, policy_version 9304 (0.0014) [2023-09-22 08:58:35,399][24648] Updated weights for policy 1, policy_version 8960 (0.0019) [2023-09-22 08:58:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 4694016. Throughput: 0: 730.6, 1: 731.3. Samples: 1147879. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:38,542][23569] Avg episode reward: [(0, '22.520'), (1, '18.260')] [2023-09-22 08:58:38,543][24306] Saving new best policy, reward=22.520! [2023-09-22 08:58:43,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5802.7, 300 sec: 5845.5). Total num frames: 4722688. Throughput: 0: 737.2, 1: 736.7. Samples: 1157101. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:43,543][23569] Avg episode reward: [(0, '22.440'), (1, '17.760')] [2023-09-22 08:58:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4751360. Throughput: 0: 735.2, 1: 734.4. Samples: 1165906. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:48,543][23569] Avg episode reward: [(0, '22.040'), (1, '18.040')] [2023-09-22 08:58:48,949][24647] Updated weights for policy 0, policy_version 9464 (0.0016) [2023-09-22 08:58:48,949][24648] Updated weights for policy 1, policy_version 9120 (0.0017) [2023-09-22 08:58:53,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4784128. Throughput: 0: 739.2, 1: 738.7. Samples: 1170451. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:53,543][23569] Avg episode reward: [(0, '22.410'), (1, '18.010')] [2023-09-22 08:58:58,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 4808704. Throughput: 0: 733.8, 1: 734.7. Samples: 1178824. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:58:58,542][23569] Avg episode reward: [(0, '22.730'), (1, '18.030')] [2023-09-22 08:58:58,551][24306] Saving new best policy, reward=22.730! [2023-09-22 08:59:03,268][24647] Updated weights for policy 0, policy_version 9624 (0.0018) [2023-09-22 08:59:03,269][24648] Updated weights for policy 1, policy_version 9280 (0.0018) [2023-09-22 08:59:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4841472. Throughput: 0: 731.7, 1: 730.5. Samples: 1187715. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:03,543][23569] Avg episode reward: [(0, '22.240'), (1, '18.060')] [2023-09-22 08:59:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4866048. Throughput: 0: 730.0, 1: 730.5. Samples: 1191936. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:08,542][23569] Avg episode reward: [(0, '22.160'), (1, '18.660')] [2023-09-22 08:59:08,543][24495] Saving new best policy, reward=18.660! [2023-09-22 08:59:13,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4898816. Throughput: 0: 735.8, 1: 735.5. Samples: 1200805. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 08:59:13,542][23569] Avg episode reward: [(0, '22.490'), (1, '18.180')] [2023-09-22 08:59:13,552][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000009392_2404352.pth... [2023-09-22 08:59:13,552][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000009736_2494464.pth... [2023-09-22 08:59:13,591][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000006656_1703936.pth [2023-09-22 08:59:13,595][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000007000_1794048.pth [2023-09-22 08:59:17,113][24648] Updated weights for policy 1, policy_version 9440 (0.0016) [2023-09-22 08:59:17,113][24647] Updated weights for policy 0, policy_version 9784 (0.0016) [2023-09-22 08:59:18,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 4931584. Throughput: 0: 741.2, 1: 739.6. Samples: 1209849. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:18,542][23569] Avg episode reward: [(0, '22.710'), (1, '18.500')] [2023-09-22 08:59:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4956160. Throughput: 0: 738.2, 1: 737.2. Samples: 1214275. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:23,543][23569] Avg episode reward: [(0, '22.240'), (1, '18.240')] [2023-09-22 08:59:28,542][23569] Fps is (10 sec: 5734.1, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 4988928. Throughput: 0: 732.9, 1: 731.5. Samples: 1223002. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:28,544][23569] Avg episode reward: [(0, '22.340'), (1, '18.500')] [2023-09-22 08:59:30,881][24648] Updated weights for policy 1, policy_version 9600 (0.0016) [2023-09-22 08:59:30,882][24647] Updated weights for policy 0, policy_version 9944 (0.0017) [2023-09-22 08:59:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 5013504. Throughput: 0: 735.0, 1: 735.4. Samples: 1232074. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:33,542][23569] Avg episode reward: [(0, '22.940'), (1, '18.020')] [2023-09-22 08:59:33,628][24306] Saving new best policy, reward=22.940! [2023-09-22 08:59:38,542][23569] Fps is (10 sec: 5734.7, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5046272. Throughput: 0: 734.4, 1: 736.1. Samples: 1236624. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:38,542][23569] Avg episode reward: [(0, '22.770'), (1, '17.780')] [2023-09-22 08:59:43,542][23569] Fps is (10 sec: 6553.4, 60 sec: 5939.2, 300 sec: 5859.4). Total num frames: 5079040. Throughput: 0: 737.4, 1: 737.3. Samples: 1245184. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:43,543][23569] Avg episode reward: [(0, '22.340'), (1, '17.890')] [2023-09-22 08:59:44,869][24647] Updated weights for policy 0, policy_version 10104 (0.0018) [2023-09-22 08:59:44,869][24648] Updated weights for policy 1, policy_version 9760 (0.0016) [2023-09-22 08:59:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5103616. Throughput: 0: 736.2, 1: 737.2. Samples: 1254014. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:48,542][23569] Avg episode reward: [(0, '22.380'), (1, '18.610')] [2023-09-22 08:59:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5136384. Throughput: 0: 741.1, 1: 740.3. Samples: 1258597. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 08:59:53,543][23569] Avg episode reward: [(0, '22.200'), (1, '18.490')] [2023-09-22 08:59:58,516][24647] Updated weights for policy 0, policy_version 10264 (0.0016) [2023-09-22 08:59:58,517][24648] Updated weights for policy 1, policy_version 9920 (0.0015) [2023-09-22 08:59:58,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.4, 300 sec: 5887.1). Total num frames: 5169152. Throughput: 0: 743.1, 1: 742.8. Samples: 1267671. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 08:59:58,543][23569] Avg episode reward: [(0, '22.140'), (1, '18.320')] [2023-09-22 09:00:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5193728. Throughput: 0: 733.6, 1: 734.7. Samples: 1275926. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:00:03,543][23569] Avg episode reward: [(0, '21.800'), (1, '18.720')] [2023-09-22 09:00:03,544][24495] Saving new best policy, reward=18.720! [2023-09-22 09:00:08,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.4, 300 sec: 5859.4). Total num frames: 5226496. Throughput: 0: 733.4, 1: 735.1. Samples: 1280359. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:08,544][23569] Avg episode reward: [(0, '22.450'), (1, '19.130')] [2023-09-22 09:00:08,546][24495] Saving new best policy, reward=19.130! [2023-09-22 09:00:12,796][24648] Updated weights for policy 1, policy_version 10080 (0.0016) [2023-09-22 09:00:12,797][24647] Updated weights for policy 0, policy_version 10424 (0.0015) [2023-09-22 09:00:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5251072. Throughput: 0: 734.2, 1: 736.1. Samples: 1289165. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:13,543][23569] Avg episode reward: [(0, '22.110'), (1, '18.650')] [2023-09-22 09:00:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5283840. Throughput: 0: 736.7, 1: 736.2. Samples: 1298351. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:18,543][23569] Avg episode reward: [(0, '23.460'), (1, '18.810')] [2023-09-22 09:00:18,544][24306] Saving new best policy, reward=23.460! [2023-09-22 09:00:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5308416. Throughput: 0: 733.8, 1: 732.7. Samples: 1302617. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:23,543][23569] Avg episode reward: [(0, '23.170'), (1, '19.100')] [2023-09-22 09:00:26,334][24647] Updated weights for policy 0, policy_version 10584 (0.0018) [2023-09-22 09:00:26,335][24648] Updated weights for policy 1, policy_version 10240 (0.0016) [2023-09-22 09:00:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 5341184. Throughput: 0: 741.6, 1: 739.2. Samples: 1311821. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:28,543][23569] Avg episode reward: [(0, '23.140'), (1, '18.520')] [2023-09-22 09:00:33,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.4, 300 sec: 5887.1). Total num frames: 5373952. Throughput: 0: 743.1, 1: 744.6. Samples: 1320960. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:33,543][23569] Avg episode reward: [(0, '23.430'), (1, '18.700')] [2023-09-22 09:00:38,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5398528. Throughput: 0: 738.2, 1: 738.9. Samples: 1325065. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:38,544][23569] Avg episode reward: [(0, '23.240'), (1, '19.090')] [2023-09-22 09:00:40,139][24647] Updated weights for policy 0, policy_version 10744 (0.0017) [2023-09-22 09:00:40,139][24648] Updated weights for policy 1, policy_version 10400 (0.0017) [2023-09-22 09:00:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 5431296. Throughput: 0: 737.1, 1: 737.0. Samples: 1334002. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:43,543][23569] Avg episode reward: [(0, '23.050'), (1, '19.120')] [2023-09-22 09:00:48,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 5464064. Throughput: 0: 748.2, 1: 748.9. Samples: 1343296. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:00:48,543][23569] Avg episode reward: [(0, '23.630'), (1, '18.610')] [2023-09-22 09:00:48,545][24306] Saving new best policy, reward=23.630! [2023-09-22 09:00:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5488640. Throughput: 0: 746.8, 1: 747.1. Samples: 1347585. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:00:53,543][23569] Avg episode reward: [(0, '23.270'), (1, '18.410')] [2023-09-22 09:00:53,823][24647] Updated weights for policy 0, policy_version 10904 (0.0016) [2023-09-22 09:00:53,824][24648] Updated weights for policy 1, policy_version 10560 (0.0016) [2023-09-22 09:00:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 5521408. Throughput: 0: 745.8, 1: 744.9. Samples: 1356244. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:00:58,543][23569] Avg episode reward: [(0, '23.380'), (1, '18.750')] [2023-09-22 09:01:03,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5545984. Throughput: 0: 738.5, 1: 739.1. Samples: 1364844. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 09:01:03,543][23569] Avg episode reward: [(0, '23.580'), (1, '18.930')] [2023-09-22 09:01:08,419][24648] Updated weights for policy 1, policy_version 10720 (0.0017) [2023-09-22 09:01:08,419][24647] Updated weights for policy 0, policy_version 11064 (0.0015) [2023-09-22 09:01:08,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 5578752. Throughput: 0: 737.8, 1: 737.7. Samples: 1369014. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 09:01:08,543][23569] Avg episode reward: [(0, '23.660'), (1, '18.350')] [2023-09-22 09:01:08,544][24306] Saving new best policy, reward=23.660! [2023-09-22 09:01:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5603328. Throughput: 0: 724.5, 1: 726.2. Samples: 1377102. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:01:13,543][23569] Avg episode reward: [(0, '23.860'), (1, '18.550')] [2023-09-22 09:01:13,555][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000010768_2756608.pth... [2023-09-22 09:01:13,555][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000011112_2846720.pth... [2023-09-22 09:01:13,591][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000008376_2146304.pth [2023-09-22 09:01:13,591][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000008016_2052096.pth [2023-09-22 09:01:13,595][24306] Saving new best policy, reward=23.860! [2023-09-22 09:01:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5873.3). Total num frames: 5636096. Throughput: 0: 727.9, 1: 727.2. Samples: 1386438. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:01:18,543][23569] Avg episode reward: [(0, '23.540'), (1, '18.610')] [2023-09-22 09:01:22,425][24647] Updated weights for policy 0, policy_version 11224 (0.0016) [2023-09-22 09:01:22,425][24648] Updated weights for policy 1, policy_version 10880 (0.0018) [2023-09-22 09:01:23,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 5660672. Throughput: 0: 728.1, 1: 728.1. Samples: 1390593. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:01:23,542][23569] Avg episode reward: [(0, '23.400'), (1, '18.640')] [2023-09-22 09:01:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5693440. Throughput: 0: 725.0, 1: 724.3. Samples: 1399217. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:01:28,542][23569] Avg episode reward: [(0, '23.420'), (1, '18.640')] [2023-09-22 09:01:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 5718016. Throughput: 0: 720.6, 1: 718.9. Samples: 1408073. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:01:33,543][23569] Avg episode reward: [(0, '23.480'), (1, '18.330')] [2023-09-22 09:01:36,615][24648] Updated weights for policy 1, policy_version 11040 (0.0017) [2023-09-22 09:01:36,616][24647] Updated weights for policy 0, policy_version 11384 (0.0016) [2023-09-22 09:01:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 5750784. Throughput: 0: 720.7, 1: 723.7. Samples: 1412585. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:01:38,543][23569] Avg episode reward: [(0, '23.230'), (1, '18.290')] [2023-09-22 09:01:43,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 5783552. Throughput: 0: 722.9, 1: 723.1. Samples: 1421312. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:01:43,543][23569] Avg episode reward: [(0, '22.280'), (1, '18.920')] [2023-09-22 09:01:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 5808128. Throughput: 0: 722.4, 1: 721.7. Samples: 1429828. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:01:48,543][23569] Avg episode reward: [(0, '22.230'), (1, '18.850')] [2023-09-22 09:01:50,716][24647] Updated weights for policy 0, policy_version 11544 (0.0014) [2023-09-22 09:01:50,716][24648] Updated weights for policy 1, policy_version 11200 (0.0017) [2023-09-22 09:01:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5840896. Throughput: 0: 721.8, 1: 721.4. Samples: 1433956. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:01:53,543][23569] Avg episode reward: [(0, '22.170'), (1, '19.400')] [2023-09-22 09:01:53,545][24495] Saving new best policy, reward=19.400! [2023-09-22 09:01:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 5865472. Throughput: 0: 728.9, 1: 729.0. Samples: 1442705. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:01:58,542][23569] Avg episode reward: [(0, '22.250'), (1, '19.500')] [2023-09-22 09:01:58,550][24495] Saving new best policy, reward=19.500! [2023-09-22 09:02:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5898240. Throughput: 0: 723.6, 1: 725.6. Samples: 1451649. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:03,543][23569] Avg episode reward: [(0, '22.100'), (1, '20.050')] [2023-09-22 09:02:03,544][24495] Saving new best policy, reward=20.050! [2023-09-22 09:02:04,821][24648] Updated weights for policy 1, policy_version 11360 (0.0018) [2023-09-22 09:02:04,821][24647] Updated weights for policy 0, policy_version 11704 (0.0018) [2023-09-22 09:02:08,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 5922816. Throughput: 0: 726.5, 1: 727.0. Samples: 1455998. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:08,543][23569] Avg episode reward: [(0, '22.350'), (1, '20.110')] [2023-09-22 09:02:08,544][24495] Saving new best policy, reward=20.110! [2023-09-22 09:02:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 5955584. Throughput: 0: 726.2, 1: 727.1. Samples: 1464615. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:13,543][23569] Avg episode reward: [(0, '22.480'), (1, '20.510')] [2023-09-22 09:02:13,555][24495] Saving new best policy, reward=20.510! [2023-09-22 09:02:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 5980160. Throughput: 0: 725.3, 1: 726.0. Samples: 1473382. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:18,542][23569] Avg episode reward: [(0, '22.710'), (1, '20.950')] [2023-09-22 09:02:18,543][24495] Saving new best policy, reward=20.950! [2023-09-22 09:02:18,803][24647] Updated weights for policy 0, policy_version 11864 (0.0015) [2023-09-22 09:02:18,804][24648] Updated weights for policy 1, policy_version 11520 (0.0016) [2023-09-22 09:02:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6012928. Throughput: 0: 726.8, 1: 723.5. Samples: 1477847. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:23,543][23569] Avg episode reward: [(0, '22.750'), (1, '21.120')] [2023-09-22 09:02:23,545][24495] Saving new best policy, reward=21.120! [2023-09-22 09:02:28,542][23569] Fps is (10 sec: 6143.9, 60 sec: 5802.6, 300 sec: 5845.5). Total num frames: 6041600. Throughput: 0: 727.0, 1: 726.0. Samples: 1486696. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:28,543][23569] Avg episode reward: [(0, '22.710'), (1, '21.130')] [2023-09-22 09:02:28,556][24495] Saving new best policy, reward=21.130! [2023-09-22 09:02:32,877][24647] Updated weights for policy 0, policy_version 12024 (0.0018) [2023-09-22 09:02:32,877][24648] Updated weights for policy 1, policy_version 11680 (0.0019) [2023-09-22 09:02:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 6070272. Throughput: 0: 725.3, 1: 725.7. Samples: 1495125. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:02:33,542][23569] Avg episode reward: [(0, '22.520'), (1, '20.970')] [2023-09-22 09:02:38,542][23569] Fps is (10 sec: 6144.2, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6103040. Throughput: 0: 728.1, 1: 729.8. Samples: 1499562. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:02:38,542][23569] Avg episode reward: [(0, '22.560'), (1, '21.940')] [2023-09-22 09:02:38,543][24495] Saving new best policy, reward=21.940! [2023-09-22 09:02:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 6127616. Throughput: 0: 730.2, 1: 729.9. Samples: 1508410. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:43,542][23569] Avg episode reward: [(0, '22.860'), (1, '21.480')] [2023-09-22 09:02:46,891][24648] Updated weights for policy 1, policy_version 11840 (0.0016) [2023-09-22 09:02:46,891][24647] Updated weights for policy 0, policy_version 12184 (0.0017) [2023-09-22 09:02:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6160384. Throughput: 0: 732.7, 1: 729.2. Samples: 1517431. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:48,542][23569] Avg episode reward: [(0, '22.430'), (1, '21.980')] [2023-09-22 09:02:48,543][24495] Saving new best policy, reward=21.980! [2023-09-22 09:02:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 6184960. Throughput: 0: 729.4, 1: 728.4. Samples: 1521601. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:53,543][23569] Avg episode reward: [(0, '22.170'), (1, '22.030')] [2023-09-22 09:02:53,543][24495] Saving new best policy, reward=22.030! [2023-09-22 09:02:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6217728. Throughput: 0: 724.9, 1: 725.2. Samples: 1529870. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:02:58,543][23569] Avg episode reward: [(0, '22.400'), (1, '21.690')] [2023-09-22 09:03:01,156][24648] Updated weights for policy 1, policy_version 12000 (0.0018) [2023-09-22 09:03:01,156][24647] Updated weights for policy 0, policy_version 12344 (0.0017) [2023-09-22 09:03:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 6242304. Throughput: 0: 725.9, 1: 726.4. Samples: 1538736. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:03:03,543][23569] Avg episode reward: [(0, '22.580'), (1, '21.960')] [2023-09-22 09:03:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6275072. Throughput: 0: 728.5, 1: 727.3. Samples: 1543357. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:03:08,543][23569] Avg episode reward: [(0, '22.440'), (1, '21.400')] [2023-09-22 09:03:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 6299648. Throughput: 0: 727.3, 1: 726.0. Samples: 1552091. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:03:13,543][23569] Avg episode reward: [(0, '22.310'), (1, '21.880')] [2023-09-22 09:03:13,697][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000012144_3108864.pth... [2023-09-22 09:03:13,713][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000012488_3198976.pth... [2023-09-22 09:03:13,727][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000009392_2404352.pth [2023-09-22 09:03:13,742][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000009736_2494464.pth [2023-09-22 09:03:15,030][24648] Updated weights for policy 1, policy_version 12160 (0.0019) [2023-09-22 09:03:15,030][24647] Updated weights for policy 0, policy_version 12504 (0.0015) [2023-09-22 09:03:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6332416. Throughput: 0: 730.8, 1: 731.0. Samples: 1560905. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:03:18,542][23569] Avg episode reward: [(0, '23.070'), (1, '21.120')] [2023-09-22 09:03:23,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6365184. Throughput: 0: 730.2, 1: 728.8. Samples: 1565219. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:03:23,543][23569] Avg episode reward: [(0, '22.830'), (1, '21.070')] [2023-09-22 09:03:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5802.7, 300 sec: 5859.4). Total num frames: 6389760. Throughput: 0: 728.7, 1: 727.8. Samples: 1573949. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:03:28,543][23569] Avg episode reward: [(0, '23.120'), (1, '20.660')] [2023-09-22 09:03:29,072][24648] Updated weights for policy 1, policy_version 12320 (0.0018) [2023-09-22 09:03:29,072][24647] Updated weights for policy 0, policy_version 12664 (0.0017) [2023-09-22 09:03:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6422528. Throughput: 0: 728.2, 1: 730.3. Samples: 1583063. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:03:33,543][23569] Avg episode reward: [(0, '23.460'), (1, '20.390')] [2023-09-22 09:03:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5845.5). Total num frames: 6447104. Throughput: 0: 728.6, 1: 729.2. Samples: 1587200. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:03:38,542][23569] Avg episode reward: [(0, '23.420'), (1, '20.320')] [2023-09-22 09:03:43,215][24648] Updated weights for policy 1, policy_version 12480 (0.0015) [2023-09-22 09:03:43,216][24647] Updated weights for policy 0, policy_version 12824 (0.0018) [2023-09-22 09:03:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6479872. Throughput: 0: 730.6, 1: 729.7. Samples: 1595584. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:03:43,543][23569] Avg episode reward: [(0, '23.410'), (1, '20.160')] [2023-09-22 09:03:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 6504448. Throughput: 0: 730.9, 1: 729.7. Samples: 1604463. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:03:48,543][23569] Avg episode reward: [(0, '23.190'), (1, '20.420')] [2023-09-22 09:03:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6537216. Throughput: 0: 730.6, 1: 731.3. Samples: 1609141. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:03:53,543][23569] Avg episode reward: [(0, '23.620'), (1, '20.460')] [2023-09-22 09:03:56,974][24647] Updated weights for policy 0, policy_version 12984 (0.0016) [2023-09-22 09:03:56,975][24648] Updated weights for policy 1, policy_version 12640 (0.0017) [2023-09-22 09:03:58,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6569984. Throughput: 0: 730.4, 1: 732.6. Samples: 1617924. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:03:58,543][23569] Avg episode reward: [(0, '23.210'), (1, '21.010')] [2023-09-22 09:04:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6594560. Throughput: 0: 734.9, 1: 735.4. Samples: 1627065. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:04:03,543][23569] Avg episode reward: [(0, '23.380'), (1, '20.970')] [2023-09-22 09:04:08,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6627328. Throughput: 0: 736.6, 1: 738.6. Samples: 1631603. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:04:08,543][23569] Avg episode reward: [(0, '22.810'), (1, '21.210')] [2023-09-22 09:04:10,789][24647] Updated weights for policy 0, policy_version 13144 (0.0019) [2023-09-22 09:04:10,790][24648] Updated weights for policy 1, policy_version 12800 (0.0019) [2023-09-22 09:04:13,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5939.2, 300 sec: 5845.5). Total num frames: 6656000. Throughput: 0: 737.6, 1: 739.4. Samples: 1640415. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:13,543][23569] Avg episode reward: [(0, '23.590'), (1, '21.240')] [2023-09-22 09:04:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6684672. Throughput: 0: 733.8, 1: 733.3. Samples: 1649085. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:18,543][23569] Avg episode reward: [(0, '23.010'), (1, '21.030')] [2023-09-22 09:04:23,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6717440. Throughput: 0: 738.0, 1: 738.2. Samples: 1653631. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:04:23,543][23569] Avg episode reward: [(0, '23.070'), (1, '20.910')] [2023-09-22 09:04:24,603][24647] Updated weights for policy 0, policy_version 13304 (0.0018) [2023-09-22 09:04:24,604][24648] Updated weights for policy 1, policy_version 12960 (0.0018) [2023-09-22 09:04:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6742016. Throughput: 0: 742.0, 1: 743.7. Samples: 1662438. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:04:28,543][23569] Avg episode reward: [(0, '23.440'), (1, '20.470')] [2023-09-22 09:04:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 6774784. Throughput: 0: 740.5, 1: 741.8. Samples: 1671168. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:33,543][23569] Avg episode reward: [(0, '22.950'), (1, '20.430')] [2023-09-22 09:04:38,495][24648] Updated weights for policy 1, policy_version 13120 (0.0016) [2023-09-22 09:04:38,495][24647] Updated weights for policy 0, policy_version 13464 (0.0017) [2023-09-22 09:04:38,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.4, 300 sec: 5859.4). Total num frames: 6807552. Throughput: 0: 739.2, 1: 739.3. Samples: 1675671. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:38,544][23569] Avg episode reward: [(0, '22.680'), (1, '20.740')] [2023-09-22 09:04:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6832128. Throughput: 0: 741.0, 1: 740.3. Samples: 1684583. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:43,543][23569] Avg episode reward: [(0, '22.580'), (1, '20.300')] [2023-09-22 09:04:48,542][23569] Fps is (10 sec: 5734.6, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 6864896. Throughput: 0: 740.3, 1: 740.4. Samples: 1693696. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:48,543][23569] Avg episode reward: [(0, '22.560'), (1, '20.290')] [2023-09-22 09:04:52,276][24648] Updated weights for policy 1, policy_version 13280 (0.0015) [2023-09-22 09:04:52,276][24647] Updated weights for policy 0, policy_version 13624 (0.0015) [2023-09-22 09:04:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 6889472. Throughput: 0: 736.5, 1: 735.0. Samples: 1697824. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:53,543][23569] Avg episode reward: [(0, '22.670'), (1, '20.520')] [2023-09-22 09:04:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 6922240. Throughput: 0: 738.6, 1: 737.6. Samples: 1706843. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:04:58,543][23569] Avg episode reward: [(0, '22.460'), (1, '20.430')] [2023-09-22 09:05:03,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 6955008. Throughput: 0: 745.6, 1: 746.3. Samples: 1716224. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:05:03,543][23569] Avg episode reward: [(0, '22.550'), (1, '20.470')] [2023-09-22 09:05:05,769][24647] Updated weights for policy 0, policy_version 13784 (0.0015) [2023-09-22 09:05:05,769][24648] Updated weights for policy 1, policy_version 13440 (0.0018) [2023-09-22 09:05:08,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 6987776. Throughput: 0: 744.0, 1: 743.3. Samples: 1720559. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:05:08,543][23569] Avg episode reward: [(0, '22.830'), (1, '21.280')] [2023-09-22 09:05:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5939.2, 300 sec: 5859.4). Total num frames: 7012352. Throughput: 0: 748.0, 1: 746.5. Samples: 1729694. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:05:13,543][23569] Avg episode reward: [(0, '22.930'), (1, '21.290')] [2023-09-22 09:05:13,554][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000013520_3461120.pth... [2023-09-22 09:05:13,554][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000013864_3551232.pth... [2023-09-22 09:05:13,590][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000010768_2756608.pth [2023-09-22 09:05:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000011112_2846720.pth [2023-09-22 09:05:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 7045120. Throughput: 0: 750.9, 1: 750.9. Samples: 1738752. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:05:18,542][23569] Avg episode reward: [(0, '23.570'), (1, '20.520')] [2023-09-22 09:05:19,481][24647] Updated weights for policy 0, policy_version 13944 (0.0015) [2023-09-22 09:05:19,481][24648] Updated weights for policy 1, policy_version 13600 (0.0018) [2023-09-22 09:05:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 7069696. Throughput: 0: 747.7, 1: 747.8. Samples: 1742969. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:05:23,543][23569] Avg episode reward: [(0, '22.670'), (1, '21.050')] [2023-09-22 09:05:28,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 7102464. Throughput: 0: 748.8, 1: 749.0. Samples: 1751984. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:05:28,543][23569] Avg episode reward: [(0, '22.610'), (1, '21.210')] [2023-09-22 09:05:33,339][24648] Updated weights for policy 1, policy_version 13760 (0.0016) [2023-09-22 09:05:33,339][24647] Updated weights for policy 0, policy_version 14104 (0.0016) [2023-09-22 09:05:33,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 7135232. Throughput: 0: 747.9, 1: 748.8. Samples: 1761045. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:05:33,542][23569] Avg episode reward: [(0, '22.790'), (1, '21.360')] [2023-09-22 09:05:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 7159808. Throughput: 0: 750.3, 1: 750.8. Samples: 1765374. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:05:38,542][23569] Avg episode reward: [(0, '22.420'), (1, '20.670')] [2023-09-22 09:05:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 7192576. Throughput: 0: 746.0, 1: 746.4. Samples: 1774002. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:05:43,543][23569] Avg episode reward: [(0, '22.370'), (1, '20.100')] [2023-09-22 09:05:47,257][24647] Updated weights for policy 0, policy_version 14264 (0.0016) [2023-09-22 09:05:47,257][24648] Updated weights for policy 1, policy_version 13920 (0.0016) [2023-09-22 09:05:48,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5939.2, 300 sec: 5873.3). Total num frames: 7221248. Throughput: 0: 741.9, 1: 742.4. Samples: 1783015. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:05:48,543][23569] Avg episode reward: [(0, '22.140'), (1, '20.570')] [2023-09-22 09:05:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 7249920. Throughput: 0: 743.4, 1: 742.0. Samples: 1787402. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:05:53,542][23569] Avg episode reward: [(0, '22.490'), (1, '21.500')] [2023-09-22 09:05:58,542][23569] Fps is (10 sec: 6144.0, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 7282688. Throughput: 0: 737.4, 1: 738.3. Samples: 1796102. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:05:58,543][23569] Avg episode reward: [(0, '22.450'), (1, '20.920')] [2023-09-22 09:06:01,034][24648] Updated weights for policy 1, policy_version 14080 (0.0017) [2023-09-22 09:06:01,034][24647] Updated weights for policy 0, policy_version 14424 (0.0017) [2023-09-22 09:06:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 7307264. Throughput: 0: 739.8, 1: 739.2. Samples: 1805305. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:06:03,543][23569] Avg episode reward: [(0, '22.390'), (1, '21.080')] [2023-09-22 09:06:08,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 7340032. Throughput: 0: 744.1, 1: 743.6. Samples: 1809919. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:06:08,543][23569] Avg episode reward: [(0, '22.720'), (1, '20.780')] [2023-09-22 09:06:13,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 7372800. Throughput: 0: 740.2, 1: 740.7. Samples: 1818624. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:06:13,542][23569] Avg episode reward: [(0, '23.290'), (1, '21.740')] [2023-09-22 09:06:14,833][24648] Updated weights for policy 1, policy_version 14240 (0.0020) [2023-09-22 09:06:14,833][24647] Updated weights for policy 0, policy_version 14584 (0.0018) [2023-09-22 09:06:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 7397376. Throughput: 0: 740.8, 1: 738.2. Samples: 1827601. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:06:18,543][23569] Avg episode reward: [(0, '23.370'), (1, '21.190')] [2023-09-22 09:06:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 7430144. Throughput: 0: 742.2, 1: 741.1. Samples: 1832123. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:06:23,543][23569] Avg episode reward: [(0, '23.490'), (1, '21.010')] [2023-09-22 09:06:28,443][24647] Updated weights for policy 0, policy_version 14744 (0.0016) [2023-09-22 09:06:28,444][24648] Updated weights for policy 1, policy_version 14400 (0.0016) [2023-09-22 09:06:28,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 7462912. Throughput: 0: 745.8, 1: 746.4. Samples: 1841151. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:06:28,543][23569] Avg episode reward: [(0, '23.610'), (1, '21.710')] [2023-09-22 09:06:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 7487488. Throughput: 0: 740.2, 1: 739.0. Samples: 1849576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:06:33,542][23569] Avg episode reward: [(0, '23.820'), (1, '21.610')] [2023-09-22 09:06:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 7520256. Throughput: 0: 743.9, 1: 744.2. Samples: 1854368. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:06:38,543][23569] Avg episode reward: [(0, '23.500'), (1, '21.870')] [2023-09-22 09:06:42,574][24647] Updated weights for policy 0, policy_version 14904 (0.0016) [2023-09-22 09:06:42,575][24648] Updated weights for policy 1, policy_version 14560 (0.0016) [2023-09-22 09:06:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 7544832. Throughput: 0: 746.4, 1: 743.2. Samples: 1863131. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:06:43,543][23569] Avg episode reward: [(0, '23.690'), (1, '21.960')] [2023-09-22 09:06:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5939.2, 300 sec: 5887.1). Total num frames: 7577600. Throughput: 0: 739.3, 1: 740.0. Samples: 1871873. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:06:48,542][23569] Avg episode reward: [(0, '23.770'), (1, '22.630')] [2023-09-22 09:06:48,543][24495] Saving new best policy, reward=22.630! [2023-09-22 09:06:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 7602176. Throughput: 0: 735.1, 1: 735.7. Samples: 1876104. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:06:53,543][23569] Avg episode reward: [(0, '23.470'), (1, '23.050')] [2023-09-22 09:06:53,662][24495] Saving new best policy, reward=23.050! [2023-09-22 09:06:56,409][24648] Updated weights for policy 1, policy_version 14720 (0.0016) [2023-09-22 09:06:56,410][24647] Updated weights for policy 0, policy_version 15064 (0.0019) [2023-09-22 09:06:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 7634944. Throughput: 0: 739.7, 1: 738.5. Samples: 1885142. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:06:58,543][23569] Avg episode reward: [(0, '23.340'), (1, '23.160')] [2023-09-22 09:06:58,552][24495] Saving new best policy, reward=23.160! [2023-09-22 09:07:03,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 7667712. Throughput: 0: 741.4, 1: 743.0. Samples: 1894400. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:03,543][23569] Avg episode reward: [(0, '23.420'), (1, '23.200')] [2023-09-22 09:07:03,545][24495] Saving new best policy, reward=23.200! [2023-09-22 09:07:08,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 7700480. Throughput: 0: 738.3, 1: 739.0. Samples: 1898602. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:08,543][23569] Avg episode reward: [(0, '23.790'), (1, '22.670')] [2023-09-22 09:07:09,947][24647] Updated weights for policy 0, policy_version 15224 (0.0015) [2023-09-22 09:07:09,948][24648] Updated weights for policy 1, policy_version 14880 (0.0020) [2023-09-22 09:07:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 7725056. Throughput: 0: 737.6, 1: 737.9. Samples: 1907549. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:13,543][23569] Avg episode reward: [(0, '23.550'), (1, '23.230')] [2023-09-22 09:07:13,554][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000014912_3817472.pth... [2023-09-22 09:07:13,554][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000015256_3907584.pth... [2023-09-22 09:07:13,589][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000012488_3198976.pth [2023-09-22 09:07:13,592][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000012144_3108864.pth [2023-09-22 09:07:13,597][24495] Saving new best policy, reward=23.230! [2023-09-22 09:07:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 7757824. Throughput: 0: 743.7, 1: 743.1. Samples: 1916486. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:18,544][23569] Avg episode reward: [(0, '24.090'), (1, '23.760')] [2023-09-22 09:07:18,545][24306] Saving new best policy, reward=24.090! [2023-09-22 09:07:18,545][24495] Saving new best policy, reward=23.760! [2023-09-22 09:07:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5901.0). Total num frames: 7782400. Throughput: 0: 739.8, 1: 741.5. Samples: 1921024. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:23,543][23569] Avg episode reward: [(0, '23.670'), (1, '23.440')] [2023-09-22 09:07:23,818][24647] Updated weights for policy 0, policy_version 15384 (0.0016) [2023-09-22 09:07:23,818][24648] Updated weights for policy 1, policy_version 15040 (0.0017) [2023-09-22 09:07:28,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 7815168. Throughput: 0: 741.8, 1: 743.5. Samples: 1929973. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:28,543][23569] Avg episode reward: [(0, '24.230'), (1, '23.350')] [2023-09-22 09:07:28,550][24306] Saving new best policy, reward=24.230! [2023-09-22 09:07:33,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.4, 300 sec: 5914.9). Total num frames: 7847936. Throughput: 0: 744.5, 1: 745.5. Samples: 1938924. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:07:33,543][23569] Avg episode reward: [(0, '23.720'), (1, '23.750')] [2023-09-22 09:07:37,598][24647] Updated weights for policy 0, policy_version 15544 (0.0016) [2023-09-22 09:07:37,598][24648] Updated weights for policy 1, policy_version 15200 (0.0019) [2023-09-22 09:07:38,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 7872512. Throughput: 0: 749.1, 1: 749.6. Samples: 1943547. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:07:38,543][23569] Avg episode reward: [(0, '24.040'), (1, '24.110')] [2023-09-22 09:07:38,545][24495] Saving new best policy, reward=24.110! [2023-09-22 09:07:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 7905280. Throughput: 0: 744.7, 1: 746.1. Samples: 1952226. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:43,543][23569] Avg episode reward: [(0, '24.150'), (1, '23.930')] [2023-09-22 09:07:48,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.4, 300 sec: 5942.7). Total num frames: 7938048. Throughput: 0: 743.4, 1: 743.2. Samples: 1961297. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:48,543][23569] Avg episode reward: [(0, '24.140'), (1, '24.420')] [2023-09-22 09:07:48,545][24495] Saving new best policy, reward=24.420! [2023-09-22 09:07:51,449][24648] Updated weights for policy 1, policy_version 15360 (0.0014) [2023-09-22 09:07:51,450][24647] Updated weights for policy 0, policy_version 15704 (0.0015) [2023-09-22 09:07:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 7962624. Throughput: 0: 745.3, 1: 744.2. Samples: 1965627. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:07:53,543][23569] Avg episode reward: [(0, '24.510'), (1, '24.020')] [2023-09-22 09:07:53,544][24306] Saving new best policy, reward=24.510! [2023-09-22 09:07:58,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5942.7). Total num frames: 7995392. Throughput: 0: 741.7, 1: 741.4. Samples: 1974287. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:07:58,543][23569] Avg episode reward: [(0, '24.470'), (1, '23.870')] [2023-09-22 09:08:03,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8019968. Throughput: 0: 734.8, 1: 735.4. Samples: 1982644. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:08:03,543][23569] Avg episode reward: [(0, '23.590'), (1, '24.070')] [2023-09-22 09:08:05,810][24648] Updated weights for policy 1, policy_version 15520 (0.0016) [2023-09-22 09:08:05,811][24647] Updated weights for policy 0, policy_version 15864 (0.0017) [2023-09-22 09:08:08,542][23569] Fps is (10 sec: 4915.0, 60 sec: 5734.4, 300 sec: 5914.9). Total num frames: 8044544. Throughput: 0: 730.9, 1: 730.3. Samples: 1986779. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:08,544][23569] Avg episode reward: [(0, '23.700'), (1, '23.500')] [2023-09-22 09:08:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8077312. Throughput: 0: 728.4, 1: 728.6. Samples: 1995537. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:13,543][23569] Avg episode reward: [(0, '23.710'), (1, '23.810')] [2023-09-22 09:08:18,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5734.4, 300 sec: 5887.1). Total num frames: 8101888. Throughput: 0: 727.1, 1: 726.2. Samples: 2004322. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:18,543][23569] Avg episode reward: [(0, '23.330'), (1, '22.770')] [2023-09-22 09:08:19,978][24648] Updated weights for policy 1, policy_version 15680 (0.0017) [2023-09-22 09:08:19,979][24647] Updated weights for policy 0, policy_version 16024 (0.0016) [2023-09-22 09:08:23,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 8134656. Throughput: 0: 725.9, 1: 725.7. Samples: 2008870. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:08:23,542][23569] Avg episode reward: [(0, '23.160'), (1, '23.320')] [2023-09-22 09:08:28,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8167424. Throughput: 0: 724.6, 1: 723.8. Samples: 2017405. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:08:28,543][23569] Avg episode reward: [(0, '24.310'), (1, '23.800')] [2023-09-22 09:08:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5914.9). Total num frames: 8192000. Throughput: 0: 726.2, 1: 725.5. Samples: 2026624. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:33,543][23569] Avg episode reward: [(0, '24.530'), (1, '23.340')] [2023-09-22 09:08:33,615][24306] Saving new best policy, reward=24.530! [2023-09-22 09:08:33,655][24647] Updated weights for policy 0, policy_version 16184 (0.0017) [2023-09-22 09:08:33,655][24648] Updated weights for policy 1, policy_version 15840 (0.0018) [2023-09-22 09:08:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 8224768. Throughput: 0: 728.8, 1: 730.4. Samples: 2031290. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:38,542][23569] Avg episode reward: [(0, '24.970'), (1, '22.990')] [2023-09-22 09:08:38,543][24306] Saving new best policy, reward=24.970! [2023-09-22 09:08:43,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5942.7). Total num frames: 8257536. Throughput: 0: 728.0, 1: 728.1. Samples: 2039808. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:43,543][23569] Avg episode reward: [(0, '24.990'), (1, '22.300')] [2023-09-22 09:08:43,551][24306] Saving new best policy, reward=24.990! [2023-09-22 09:08:47,614][24647] Updated weights for policy 0, policy_version 16344 (0.0015) [2023-09-22 09:08:47,614][24648] Updated weights for policy 1, policy_version 16000 (0.0020) [2023-09-22 09:08:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5914.9). Total num frames: 8282112. Throughput: 0: 732.1, 1: 732.3. Samples: 2048540. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:48,542][23569] Avg episode reward: [(0, '24.260'), (1, '22.920')] [2023-09-22 09:08:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8314880. Throughput: 0: 734.2, 1: 734.0. Samples: 2052850. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:53,543][23569] Avg episode reward: [(0, '24.770'), (1, '22.220')] [2023-09-22 09:08:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5914.9). Total num frames: 8339456. Throughput: 0: 738.6, 1: 739.3. Samples: 2062045. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:08:58,543][23569] Avg episode reward: [(0, '24.710'), (1, '21.930')] [2023-09-22 09:09:01,502][24647] Updated weights for policy 0, policy_version 16504 (0.0016) [2023-09-22 09:09:01,502][24648] Updated weights for policy 1, policy_version 16160 (0.0016) [2023-09-22 09:09:03,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8372224. Throughput: 0: 736.3, 1: 735.8. Samples: 2070567. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:03,543][23569] Avg episode reward: [(0, '24.350'), (1, '22.480')] [2023-09-22 09:09:08,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5928.8). Total num frames: 8404992. Throughput: 0: 737.2, 1: 736.8. Samples: 2075201. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:08,543][23569] Avg episode reward: [(0, '23.880'), (1, '22.650')] [2023-09-22 09:09:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8429568. Throughput: 0: 735.4, 1: 736.1. Samples: 2083621. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:13,543][23569] Avg episode reward: [(0, '24.130'), (1, '22.380')] [2023-09-22 09:09:13,555][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000016632_4259840.pth... [2023-09-22 09:09:13,555][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000016288_4169728.pth... [2023-09-22 09:09:13,586][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000013864_3551232.pth [2023-09-22 09:09:13,597][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000013520_3461120.pth [2023-09-22 09:09:15,773][24647] Updated weights for policy 0, policy_version 16664 (0.0017) [2023-09-22 09:09:15,773][24648] Updated weights for policy 1, policy_version 16320 (0.0017) [2023-09-22 09:09:18,542][23569] Fps is (10 sec: 4915.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 8454144. Throughput: 0: 729.2, 1: 729.6. Samples: 2092269. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:18,543][23569] Avg episode reward: [(0, '23.630'), (1, '22.990')] [2023-09-22 09:09:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8486912. Throughput: 0: 726.8, 1: 726.6. Samples: 2096690. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:09:23,543][23569] Avg episode reward: [(0, '23.960'), (1, '23.020')] [2023-09-22 09:09:28,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8519680. Throughput: 0: 728.9, 1: 728.3. Samples: 2105380. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:09:28,542][23569] Avg episode reward: [(0, '24.080'), (1, '23.190')] [2023-09-22 09:09:29,624][24648] Updated weights for policy 1, policy_version 16480 (0.0018) [2023-09-22 09:09:29,624][24647] Updated weights for policy 0, policy_version 16824 (0.0016) [2023-09-22 09:09:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 8544256. Throughput: 0: 732.7, 1: 732.6. Samples: 2114482. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:09:33,543][23569] Avg episode reward: [(0, '24.220'), (1, '23.060')] [2023-09-22 09:09:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8577024. Throughput: 0: 733.7, 1: 732.7. Samples: 2118836. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:38,542][23569] Avg episode reward: [(0, '24.130'), (1, '22.750')] [2023-09-22 09:09:43,541][24647] Updated weights for policy 0, policy_version 16984 (0.0018) [2023-09-22 09:09:43,541][24648] Updated weights for policy 1, policy_version 16640 (0.0019) [2023-09-22 09:09:43,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8609792. Throughput: 0: 731.1, 1: 731.7. Samples: 2127869. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:43,543][23569] Avg episode reward: [(0, '24.320'), (1, '23.530')] [2023-09-22 09:09:48,543][23569] Fps is (10 sec: 5734.0, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8634368. Throughput: 0: 733.3, 1: 732.7. Samples: 2136536. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:48,544][23569] Avg episode reward: [(0, '24.490'), (1, '24.020')] [2023-09-22 09:09:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 8667136. Throughput: 0: 732.7, 1: 732.1. Samples: 2141117. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:53,543][23569] Avg episode reward: [(0, '24.020'), (1, '23.970')] [2023-09-22 09:09:57,344][24647] Updated weights for policy 0, policy_version 17144 (0.0014) [2023-09-22 09:09:57,345][24648] Updated weights for policy 1, policy_version 16800 (0.0016) [2023-09-22 09:09:58,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 8691712. Throughput: 0: 738.6, 1: 738.3. Samples: 2150083. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:09:58,543][23569] Avg episode reward: [(0, '24.190'), (1, '24.070')] [2023-09-22 09:10:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 8724480. Throughput: 0: 736.7, 1: 737.2. Samples: 2158592. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:03,543][23569] Avg episode reward: [(0, '24.090'), (1, '24.820')] [2023-09-22 09:10:03,544][24495] Saving new best policy, reward=24.820! [2023-09-22 09:10:08,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5802.7, 300 sec: 5901.0). Total num frames: 8753152. Throughput: 0: 736.4, 1: 735.7. Samples: 2162934. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:08,543][23569] Avg episode reward: [(0, '22.660'), (1, '25.230')] [2023-09-22 09:10:08,559][24495] Saving new best policy, reward=25.230! [2023-09-22 09:10:11,356][24647] Updated weights for policy 0, policy_version 17304 (0.0017) [2023-09-22 09:10:11,356][24648] Updated weights for policy 1, policy_version 16960 (0.0017) [2023-09-22 09:10:13,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 8781824. Throughput: 0: 738.6, 1: 739.1. Samples: 2171875. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:13,543][23569] Avg episode reward: [(0, '21.970'), (1, '25.750')] [2023-09-22 09:10:13,550][24495] Saving new best policy, reward=25.750! [2023-09-22 09:10:18,542][23569] Fps is (10 sec: 6144.1, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 8814592. Throughput: 0: 738.8, 1: 739.7. Samples: 2181015. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:18,543][23569] Avg episode reward: [(0, '20.380'), (1, '25.110')] [2023-09-22 09:10:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 8839168. Throughput: 0: 736.8, 1: 738.6. Samples: 2185225. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:23,543][23569] Avg episode reward: [(0, '20.230'), (1, '24.820')] [2023-09-22 09:10:25,061][24647] Updated weights for policy 0, policy_version 17464 (0.0016) [2023-09-22 09:10:25,061][24648] Updated weights for policy 1, policy_version 17120 (0.0018) [2023-09-22 09:10:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 8871936. Throughput: 0: 737.6, 1: 737.6. Samples: 2194255. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:10:28,543][23569] Avg episode reward: [(0, '20.580'), (1, '25.560')] [2023-09-22 09:10:33,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 8904704. Throughput: 0: 740.3, 1: 741.5. Samples: 2203214. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:10:33,543][23569] Avg episode reward: [(0, '21.210'), (1, '24.770')] [2023-09-22 09:10:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 8929280. Throughput: 0: 739.7, 1: 740.9. Samples: 2207744. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:38,543][23569] Avg episode reward: [(0, '21.870'), (1, '24.890')] [2023-09-22 09:10:38,903][24647] Updated weights for policy 0, policy_version 17624 (0.0017) [2023-09-22 09:10:38,903][24648] Updated weights for policy 1, policy_version 17280 (0.0015) [2023-09-22 09:10:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5901.0). Total num frames: 8962048. Throughput: 0: 734.6, 1: 734.5. Samples: 2216191. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:43,543][23569] Avg episode reward: [(0, '22.960'), (1, '25.370')] [2023-09-22 09:10:48,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 8986624. Throughput: 0: 740.7, 1: 739.2. Samples: 2225188. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:10:48,543][23569] Avg episode reward: [(0, '23.510'), (1, '25.360')] [2023-09-22 09:10:53,102][24648] Updated weights for policy 1, policy_version 17440 (0.0018) [2023-09-22 09:10:53,102][24647] Updated weights for policy 0, policy_version 17784 (0.0019) [2023-09-22 09:10:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9019392. Throughput: 0: 738.1, 1: 737.7. Samples: 2229346. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:10:53,543][23569] Avg episode reward: [(0, '23.140'), (1, '24.910')] [2023-09-22 09:10:58,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9043968. Throughput: 0: 737.2, 1: 739.9. Samples: 2238345. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:10:58,543][23569] Avg episode reward: [(0, '24.080'), (1, '25.270')] [2023-09-22 09:11:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9076736. Throughput: 0: 729.5, 1: 729.2. Samples: 2246656. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:11:03,543][23569] Avg episode reward: [(0, '23.410'), (1, '25.310')] [2023-09-22 09:11:07,325][24647] Updated weights for policy 0, policy_version 17944 (0.0016) [2023-09-22 09:11:07,325][24648] Updated weights for policy 1, policy_version 17600 (0.0016) [2023-09-22 09:11:08,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5802.7, 300 sec: 5859.4). Total num frames: 9101312. Throughput: 0: 728.2, 1: 728.1. Samples: 2250757. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:11:08,543][23569] Avg episode reward: [(0, '23.790'), (1, '25.040')] [2023-09-22 09:11:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9134080. Throughput: 0: 726.3, 1: 725.5. Samples: 2259587. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:11:13,543][23569] Avg episode reward: [(0, '23.800'), (1, '24.870')] [2023-09-22 09:11:13,554][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000017664_4521984.pth... [2023-09-22 09:11:13,554][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000018008_4612096.pth... [2023-09-22 09:11:13,585][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000014912_3817472.pth [2023-09-22 09:11:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000015256_3907584.pth [2023-09-22 09:11:18,542][23569] Fps is (10 sec: 6553.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9166848. Throughput: 0: 731.1, 1: 727.9. Samples: 2268867. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:11:18,543][23569] Avg episode reward: [(0, '23.750'), (1, '24.970')] [2023-09-22 09:11:21,263][24647] Updated weights for policy 0, policy_version 18104 (0.0016) [2023-09-22 09:11:21,263][24648] Updated weights for policy 1, policy_version 17760 (0.0018) [2023-09-22 09:11:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9191424. Throughput: 0: 726.8, 1: 726.7. Samples: 2273152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:11:23,543][23569] Avg episode reward: [(0, '23.890'), (1, '24.730')] [2023-09-22 09:11:28,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9224192. Throughput: 0: 728.0, 1: 727.4. Samples: 2281687. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:11:28,544][23569] Avg episode reward: [(0, '23.760'), (1, '23.380')] [2023-09-22 09:11:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 9248768. Throughput: 0: 729.1, 1: 730.3. Samples: 2290863. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:11:33,543][23569] Avg episode reward: [(0, '23.590'), (1, '23.690')] [2023-09-22 09:11:34,966][24647] Updated weights for policy 0, policy_version 18264 (0.0017) [2023-09-22 09:11:34,967][24648] Updated weights for policy 1, policy_version 17920 (0.0016) [2023-09-22 09:11:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9281536. Throughput: 0: 735.2, 1: 735.6. Samples: 2295532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:11:38,544][23569] Avg episode reward: [(0, '23.590'), (1, '23.670')] [2023-09-22 09:11:43,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9314304. Throughput: 0: 731.4, 1: 728.4. Samples: 2304033. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:11:43,543][23569] Avg episode reward: [(0, '23.510'), (1, '23.470')] [2023-09-22 09:11:48,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 9338880. Throughput: 0: 736.9, 1: 736.3. Samples: 2312947. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:11:48,543][23569] Avg episode reward: [(0, '23.240'), (1, '23.360')] [2023-09-22 09:11:48,997][24647] Updated weights for policy 0, policy_version 18424 (0.0017) [2023-09-22 09:11:48,997][24648] Updated weights for policy 1, policy_version 18080 (0.0017) [2023-09-22 09:11:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 9371648. Throughput: 0: 739.2, 1: 737.2. Samples: 2317194. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:11:53,543][23569] Avg episode reward: [(0, '23.650'), (1, '23.550')] [2023-09-22 09:11:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9396224. Throughput: 0: 741.6, 1: 741.8. Samples: 2326341. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:11:58,543][23569] Avg episode reward: [(0, '23.620'), (1, '23.510')] [2023-09-22 09:12:03,004][24647] Updated weights for policy 0, policy_version 18584 (0.0017) [2023-09-22 09:12:03,004][24648] Updated weights for policy 1, policy_version 18240 (0.0016) [2023-09-22 09:12:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 9428992. Throughput: 0: 730.2, 1: 733.2. Samples: 2334721. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:12:03,542][23569] Avg episode reward: [(0, '23.680'), (1, '24.720')] [2023-09-22 09:12:08,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 9461760. Throughput: 0: 733.5, 1: 732.4. Samples: 2339118. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:12:08,542][23569] Avg episode reward: [(0, '24.110'), (1, '24.580')] [2023-09-22 09:12:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9486336. Throughput: 0: 734.0, 1: 734.8. Samples: 2347781. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:12:13,543][23569] Avg episode reward: [(0, '24.270'), (1, '24.190')] [2023-09-22 09:12:17,078][24648] Updated weights for policy 1, policy_version 18400 (0.0018) [2023-09-22 09:12:17,078][24647] Updated weights for policy 0, policy_version 18744 (0.0016) [2023-09-22 09:12:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9519104. Throughput: 0: 731.9, 1: 731.9. Samples: 2356735. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:12:18,543][23569] Avg episode reward: [(0, '24.310'), (1, '25.070')] [2023-09-22 09:12:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9543680. Throughput: 0: 730.3, 1: 730.3. Samples: 2361260. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:12:23,543][23569] Avg episode reward: [(0, '24.270'), (1, '24.500')] [2023-09-22 09:12:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 9576448. Throughput: 0: 732.3, 1: 732.2. Samples: 2369933. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:12:28,543][23569] Avg episode reward: [(0, '24.430'), (1, '24.880')] [2023-09-22 09:12:30,901][24648] Updated weights for policy 1, policy_version 18560 (0.0018) [2023-09-22 09:12:30,902][24647] Updated weights for policy 0, policy_version 18904 (0.0018) [2023-09-22 09:12:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9601024. Throughput: 0: 730.6, 1: 730.8. Samples: 2378711. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:12:33,543][23569] Avg episode reward: [(0, '24.240'), (1, '25.000')] [2023-09-22 09:12:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 9633792. Throughput: 0: 734.6, 1: 735.3. Samples: 2383339. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:12:38,543][23569] Avg episode reward: [(0, '24.170'), (1, '24.310')] [2023-09-22 09:12:43,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9666560. Throughput: 0: 730.8, 1: 730.8. Samples: 2392110. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:12:43,543][23569] Avg episode reward: [(0, '23.620'), (1, '24.370')] [2023-09-22 09:12:44,706][24647] Updated weights for policy 0, policy_version 19064 (0.0016) [2023-09-22 09:12:44,706][24648] Updated weights for policy 1, policy_version 18720 (0.0015) [2023-09-22 09:12:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9691136. Throughput: 0: 731.0, 1: 730.3. Samples: 2400481. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:12:48,543][23569] Avg episode reward: [(0, '24.050'), (1, '24.150')] [2023-09-22 09:12:53,542][23569] Fps is (10 sec: 4915.2, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 9715712. Throughput: 0: 727.5, 1: 727.8. Samples: 2404606. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:12:53,543][23569] Avg episode reward: [(0, '24.080'), (1, '24.690')] [2023-09-22 09:12:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9748480. Throughput: 0: 729.5, 1: 728.5. Samples: 2413392. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:12:58,543][23569] Avg episode reward: [(0, '24.300'), (1, '24.600')] [2023-09-22 09:12:59,242][24648] Updated weights for policy 1, policy_version 18880 (0.0019) [2023-09-22 09:12:59,242][24647] Updated weights for policy 0, policy_version 19224 (0.0018) [2023-09-22 09:13:03,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9781248. Throughput: 0: 728.5, 1: 727.2. Samples: 2422242. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:13:03,543][23569] Avg episode reward: [(0, '24.850'), (1, '24.620')] [2023-09-22 09:13:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 9805824. Throughput: 0: 728.6, 1: 729.6. Samples: 2426877. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:13:08,543][23569] Avg episode reward: [(0, '24.170'), (1, '24.190')] [2023-09-22 09:13:12,966][24647] Updated weights for policy 0, policy_version 19384 (0.0016) [2023-09-22 09:13:12,967][24648] Updated weights for policy 1, policy_version 19040 (0.0018) [2023-09-22 09:13:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 9838592. Throughput: 0: 730.7, 1: 729.9. Samples: 2435660. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:13:13,543][23569] Avg episode reward: [(0, '24.140'), (1, '24.580')] [2023-09-22 09:13:13,551][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000019040_4874240.pth... [2023-09-22 09:13:13,551][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000019384_4964352.pth... [2023-09-22 09:13:13,582][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000016632_4259840.pth [2023-09-22 09:13:13,589][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000016288_4169728.pth [2023-09-22 09:13:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 9863168. Throughput: 0: 723.4, 1: 722.7. Samples: 2443786. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:13:18,542][23569] Avg episode reward: [(0, '24.240'), (1, '24.510')] [2023-09-22 09:13:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9895936. Throughput: 0: 720.8, 1: 721.9. Samples: 2448259. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:13:23,542][23569] Avg episode reward: [(0, '23.780'), (1, '24.430')] [2023-09-22 09:13:27,369][24647] Updated weights for policy 0, policy_version 19544 (0.0015) [2023-09-22 09:13:27,370][24648] Updated weights for policy 1, policy_version 19200 (0.0016) [2023-09-22 09:13:28,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 9920512. Throughput: 0: 724.0, 1: 725.7. Samples: 2457348. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:13:28,544][23569] Avg episode reward: [(0, '24.230'), (1, '23.730')] [2023-09-22 09:13:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9953280. Throughput: 0: 725.4, 1: 726.1. Samples: 2465797. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:13:33,543][23569] Avg episode reward: [(0, '23.500'), (1, '23.790')] [2023-09-22 09:13:38,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 9986048. Throughput: 0: 728.7, 1: 729.0. Samples: 2470203. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:13:38,543][23569] Avg episode reward: [(0, '23.940'), (1, '23.820')] [2023-09-22 09:13:41,263][24648] Updated weights for policy 1, policy_version 19360 (0.0017) [2023-09-22 09:13:41,263][24647] Updated weights for policy 0, policy_version 19704 (0.0018) [2023-09-22 09:13:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 10010624. Throughput: 0: 730.8, 1: 730.8. Samples: 2479164. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:13:43,543][23569] Avg episode reward: [(0, '24.030'), (1, '23.870')] [2023-09-22 09:13:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10043392. Throughput: 0: 728.4, 1: 730.2. Samples: 2487882. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:13:48,543][23569] Avg episode reward: [(0, '24.220'), (1, '24.130')] [2023-09-22 09:13:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 10067968. Throughput: 0: 725.8, 1: 727.3. Samples: 2492268. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:13:53,543][23569] Avg episode reward: [(0, '23.600'), (1, '24.530')] [2023-09-22 09:13:55,465][24648] Updated weights for policy 1, policy_version 19520 (0.0019) [2023-09-22 09:13:55,465][24647] Updated weights for policy 0, policy_version 19864 (0.0018) [2023-09-22 09:13:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10100736. Throughput: 0: 723.2, 1: 723.6. Samples: 2500770. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:13:58,543][23569] Avg episode reward: [(0, '23.040'), (1, '24.070')] [2023-09-22 09:14:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 10125312. Throughput: 0: 734.0, 1: 733.6. Samples: 2509829. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:14:03,543][23569] Avg episode reward: [(0, '23.660'), (1, '24.910')] [2023-09-22 09:14:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10158080. Throughput: 0: 733.0, 1: 734.2. Samples: 2514283. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:14:08,543][23569] Avg episode reward: [(0, '23.090'), (1, '24.740')] [2023-09-22 09:14:09,279][24647] Updated weights for policy 0, policy_version 20024 (0.0013) [2023-09-22 09:14:09,280][24648] Updated weights for policy 1, policy_version 19680 (0.0016) [2023-09-22 09:14:13,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 10190848. Throughput: 0: 731.5, 1: 730.3. Samples: 2523129. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:14:13,543][23569] Avg episode reward: [(0, '23.310'), (1, '23.640')] [2023-09-22 09:14:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10215424. Throughput: 0: 733.2, 1: 732.7. Samples: 2531765. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:14:18,543][23569] Avg episode reward: [(0, '23.640'), (1, '24.170')] [2023-09-22 09:14:23,298][24647] Updated weights for policy 0, policy_version 20184 (0.0014) [2023-09-22 09:14:23,298][24648] Updated weights for policy 1, policy_version 19840 (0.0015) [2023-09-22 09:14:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10248192. Throughput: 0: 734.4, 1: 734.8. Samples: 2536314. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:14:23,543][23569] Avg episode reward: [(0, '23.490'), (1, '24.480')] [2023-09-22 09:14:28,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10272768. Throughput: 0: 729.6, 1: 730.2. Samples: 2544854. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:14:28,543][23569] Avg episode reward: [(0, '23.730'), (1, '24.100')] [2023-09-22 09:14:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10305536. Throughput: 0: 733.2, 1: 733.0. Samples: 2553859. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:14:33,543][23569] Avg episode reward: [(0, '23.470'), (1, '23.510')] [2023-09-22 09:14:37,004][24648] Updated weights for policy 1, policy_version 20000 (0.0020) [2023-09-22 09:14:37,004][24647] Updated weights for policy 0, policy_version 20344 (0.0020) [2023-09-22 09:14:38,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10338304. Throughput: 0: 736.9, 1: 733.7. Samples: 2558448. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:14:38,543][23569] Avg episode reward: [(0, '23.930'), (1, '24.320')] [2023-09-22 09:14:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 10362880. Throughput: 0: 739.4, 1: 738.9. Samples: 2567292. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:14:43,542][23569] Avg episode reward: [(0, '24.080'), (1, '24.390')] [2023-09-22 09:14:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10395648. Throughput: 0: 736.0, 1: 737.2. Samples: 2576124. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:14:48,543][23569] Avg episode reward: [(0, '24.370'), (1, '24.500')] [2023-09-22 09:14:50,997][24648] Updated weights for policy 1, policy_version 20160 (0.0019) [2023-09-22 09:14:50,998][24647] Updated weights for policy 0, policy_version 20504 (0.0018) [2023-09-22 09:14:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10420224. Throughput: 0: 736.0, 1: 735.1. Samples: 2580483. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:14:53,543][23569] Avg episode reward: [(0, '23.730'), (1, '24.820')] [2023-09-22 09:14:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10452992. Throughput: 0: 731.6, 1: 731.2. Samples: 2588955. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:14:58,543][23569] Avg episode reward: [(0, '24.110'), (1, '24.860')] [2023-09-22 09:15:03,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5873.3). Total num frames: 10485760. Throughput: 0: 739.6, 1: 741.2. Samples: 2598402. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:15:03,542][23569] Avg episode reward: [(0, '24.140'), (1, '24.900')] [2023-09-22 09:15:04,849][24648] Updated weights for policy 1, policy_version 20320 (0.0015) [2023-09-22 09:15:04,850][24647] Updated weights for policy 0, policy_version 20664 (0.0017) [2023-09-22 09:15:08,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 10510336. Throughput: 0: 739.5, 1: 737.6. Samples: 2602782. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:15:08,542][23569] Avg episode reward: [(0, '24.420'), (1, '25.420')] [2023-09-22 09:15:13,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10543104. Throughput: 0: 738.0, 1: 737.8. Samples: 2611261. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-22 09:15:13,543][23569] Avg episode reward: [(0, '24.550'), (1, '24.890')] [2023-09-22 09:15:13,556][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000020416_5226496.pth... [2023-09-22 09:15:13,556][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000020760_5316608.pth... [2023-09-22 09:15:13,592][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000017664_4521984.pth [2023-09-22 09:15:13,594][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000018008_4612096.pth [2023-09-22 09:15:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10567680. Throughput: 0: 739.0, 1: 739.8. Samples: 2620403. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:15:18,543][23569] Avg episode reward: [(0, '23.920'), (1, '25.410')] [2023-09-22 09:15:18,757][24648] Updated weights for policy 1, policy_version 20480 (0.0018) [2023-09-22 09:15:18,757][24647] Updated weights for policy 0, policy_version 20824 (0.0016) [2023-09-22 09:15:23,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10600448. Throughput: 0: 737.0, 1: 738.2. Samples: 2624832. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:15:23,543][23569] Avg episode reward: [(0, '24.090'), (1, '24.510')] [2023-09-22 09:15:28,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 10625024. Throughput: 0: 737.1, 1: 737.4. Samples: 2633645. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:15:28,544][23569] Avg episode reward: [(0, '24.000'), (1, '24.740')] [2023-09-22 09:15:32,900][24648] Updated weights for policy 1, policy_version 20640 (0.0018) [2023-09-22 09:15:32,900][24647] Updated weights for policy 0, policy_version 20984 (0.0018) [2023-09-22 09:15:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10657792. Throughput: 0: 731.7, 1: 731.4. Samples: 2641964. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:15:33,543][23569] Avg episode reward: [(0, '24.220'), (1, '25.160')] [2023-09-22 09:15:38,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10690560. Throughput: 0: 736.7, 1: 735.6. Samples: 2646737. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:15:38,544][23569] Avg episode reward: [(0, '23.710'), (1, '25.770')] [2023-09-22 09:15:38,545][24495] Saving new best policy, reward=25.770! [2023-09-22 09:15:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10715136. Throughput: 0: 741.6, 1: 741.1. Samples: 2655674. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:15:43,543][23569] Avg episode reward: [(0, '23.850'), (1, '25.710')] [2023-09-22 09:15:46,671][24647] Updated weights for policy 0, policy_version 21144 (0.0016) [2023-09-22 09:15:46,671][24648] Updated weights for policy 1, policy_version 20800 (0.0017) [2023-09-22 09:15:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10747904. Throughput: 0: 734.4, 1: 733.4. Samples: 2664450. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:15:48,543][23569] Avg episode reward: [(0, '24.480'), (1, '26.280')] [2023-09-22 09:15:48,544][24495] Saving new best policy, reward=26.280! [2023-09-22 09:15:53,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5939.2, 300 sec: 5873.2). Total num frames: 10776576. Throughput: 0: 732.2, 1: 733.7. Samples: 2668748. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:15:53,543][23569] Avg episode reward: [(0, '23.960'), (1, '26.600')] [2023-09-22 09:15:53,544][24495] Saving new best policy, reward=26.600! [2023-09-22 09:15:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10805248. Throughput: 0: 738.0, 1: 738.6. Samples: 2677708. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:15:58,543][23569] Avg episode reward: [(0, '23.990'), (1, '26.520')] [2023-09-22 09:16:00,591][24647] Updated weights for policy 0, policy_version 21304 (0.0015) [2023-09-22 09:16:00,592][24648] Updated weights for policy 1, policy_version 20960 (0.0016) [2023-09-22 09:16:03,542][23569] Fps is (10 sec: 6144.0, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 10838016. Throughput: 0: 734.2, 1: 732.7. Samples: 2686416. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:16:03,543][23569] Avg episode reward: [(0, '23.870'), (1, '27.140')] [2023-09-22 09:16:03,544][24495] Saving new best policy, reward=27.140! [2023-09-22 09:16:08,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10862592. Throughput: 0: 732.0, 1: 733.0. Samples: 2690759. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:16:08,543][23569] Avg episode reward: [(0, '24.420'), (1, '26.610')] [2023-09-22 09:16:13,542][23569] Fps is (10 sec: 5324.7, 60 sec: 5802.7, 300 sec: 5845.5). Total num frames: 10891264. Throughput: 0: 728.2, 1: 728.7. Samples: 2699202. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:16:13,543][23569] Avg episode reward: [(0, '23.700'), (1, '27.150')] [2023-09-22 09:16:13,556][24495] Saving new best policy, reward=27.150! [2023-09-22 09:16:14,938][24648] Updated weights for policy 1, policy_version 21120 (0.0016) [2023-09-22 09:16:14,938][24647] Updated weights for policy 0, policy_version 21464 (0.0017) [2023-09-22 09:16:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 10919936. Throughput: 0: 733.4, 1: 733.4. Samples: 2707973. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:16:18,543][23569] Avg episode reward: [(0, '23.370'), (1, '27.050')] [2023-09-22 09:16:23,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10952704. Throughput: 0: 732.9, 1: 733.8. Samples: 2712735. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:16:23,543][23569] Avg episode reward: [(0, '23.360'), (1, '27.270')] [2023-09-22 09:16:23,545][24495] Saving new best policy, reward=27.270! [2023-09-22 09:16:28,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 10977280. Throughput: 0: 732.2, 1: 734.0. Samples: 2721652. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:16:28,544][23569] Avg episode reward: [(0, '22.890'), (1, '26.780')] [2023-09-22 09:16:28,731][24647] Updated weights for policy 0, policy_version 21624 (0.0016) [2023-09-22 09:16:28,731][24648] Updated weights for policy 1, policy_version 21280 (0.0016) [2023-09-22 09:16:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11010048. Throughput: 0: 728.2, 1: 728.2. Samples: 2729984. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:16:33,543][23569] Avg episode reward: [(0, '22.470'), (1, '26.780')] [2023-09-22 09:16:38,542][23569] Fps is (10 sec: 6553.8, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 11042816. Throughput: 0: 728.6, 1: 728.5. Samples: 2734318. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:16:38,542][23569] Avg episode reward: [(0, '22.740'), (1, '26.800')] [2023-09-22 09:16:42,714][24648] Updated weights for policy 1, policy_version 21440 (0.0017) [2023-09-22 09:16:42,714][24647] Updated weights for policy 0, policy_version 21784 (0.0014) [2023-09-22 09:16:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11067392. Throughput: 0: 728.1, 1: 728.0. Samples: 2743235. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:16:43,543][23569] Avg episode reward: [(0, '22.830'), (1, '26.900')] [2023-09-22 09:16:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5859.4). Total num frames: 11100160. Throughput: 0: 731.0, 1: 733.1. Samples: 2752302. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:16:48,542][23569] Avg episode reward: [(0, '22.280'), (1, '27.070')] [2023-09-22 09:16:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5802.7, 300 sec: 5859.4). Total num frames: 11124736. Throughput: 0: 730.9, 1: 730.6. Samples: 2756529. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:16:53,543][23569] Avg episode reward: [(0, '22.960'), (1, '27.640')] [2023-09-22 09:16:53,545][24495] Saving new best policy, reward=27.640! [2023-09-22 09:16:56,816][24648] Updated weights for policy 1, policy_version 21600 (0.0018) [2023-09-22 09:16:56,816][24647] Updated weights for policy 0, policy_version 21944 (0.0014) [2023-09-22 09:16:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11157504. Throughput: 0: 731.3, 1: 731.1. Samples: 2765011. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:16:58,544][23569] Avg episode reward: [(0, '23.320'), (1, '27.420')] [2023-09-22 09:17:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 11182080. Throughput: 0: 731.5, 1: 731.8. Samples: 2773822. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:17:03,543][23569] Avg episode reward: [(0, '23.230'), (1, '27.170')] [2023-09-22 09:17:08,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11214848. Throughput: 0: 727.0, 1: 727.1. Samples: 2778170. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:17:08,543][23569] Avg episode reward: [(0, '23.750'), (1, '27.230')] [2023-09-22 09:17:10,664][24648] Updated weights for policy 1, policy_version 21760 (0.0018) [2023-09-22 09:17:10,664][24647] Updated weights for policy 0, policy_version 22104 (0.0017) [2023-09-22 09:17:13,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5939.2, 300 sec: 5859.4). Total num frames: 11247616. Throughput: 0: 730.2, 1: 729.3. Samples: 2787328. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:17:13,543][23569] Avg episode reward: [(0, '23.490'), (1, '27.660')] [2023-09-22 09:17:13,552][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000021792_5578752.pth... [2023-09-22 09:17:13,552][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000022136_5668864.pth... [2023-09-22 09:17:13,581][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000019040_4874240.pth [2023-09-22 09:17:13,584][24495] Saving new best policy, reward=27.660! [2023-09-22 09:17:13,586][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000019384_4964352.pth [2023-09-22 09:17:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11272192. Throughput: 0: 737.4, 1: 736.4. Samples: 2796302. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:17:18,543][23569] Avg episode reward: [(0, '24.270'), (1, '28.310')] [2023-09-22 09:17:18,543][24495] Saving new best policy, reward=28.310! [2023-09-22 09:17:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11304960. Throughput: 0: 737.9, 1: 738.6. Samples: 2800759. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:17:23,543][23569] Avg episode reward: [(0, '24.400'), (1, '28.520')] [2023-09-22 09:17:23,544][24495] Saving new best policy, reward=28.520! [2023-09-22 09:17:24,394][24647] Updated weights for policy 0, policy_version 22264 (0.0015) [2023-09-22 09:17:24,395][24648] Updated weights for policy 1, policy_version 21920 (0.0018) [2023-09-22 09:17:28,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11329536. Throughput: 0: 737.3, 1: 739.7. Samples: 2809698. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:17:28,544][23569] Avg episode reward: [(0, '23.990'), (1, '28.690')] [2023-09-22 09:17:28,647][24495] Saving new best policy, reward=28.690! [2023-09-22 09:17:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11362304. Throughput: 0: 732.4, 1: 730.8. Samples: 2818147. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:17:33,543][23569] Avg episode reward: [(0, '23.730'), (1, '28.500')] [2023-09-22 09:17:38,223][24647] Updated weights for policy 0, policy_version 22424 (0.0018) [2023-09-22 09:17:38,223][24648] Updated weights for policy 1, policy_version 22080 (0.0017) [2023-09-22 09:17:38,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11395072. Throughput: 0: 738.1, 1: 737.5. Samples: 2822931. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:17:38,544][23569] Avg episode reward: [(0, '23.580'), (1, '28.230')] [2023-09-22 09:17:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11419648. Throughput: 0: 743.4, 1: 743.1. Samples: 2831902. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:17:43,543][23569] Avg episode reward: [(0, '24.190'), (1, '28.040')] [2023-09-22 09:17:48,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11452416. Throughput: 0: 741.6, 1: 741.9. Samples: 2840576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:17:48,542][23569] Avg episode reward: [(0, '24.630'), (1, '27.480')] [2023-09-22 09:17:52,052][24647] Updated weights for policy 0, policy_version 22584 (0.0014) [2023-09-22 09:17:52,052][24648] Updated weights for policy 1, policy_version 22240 (0.0016) [2023-09-22 09:17:53,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 11485184. Throughput: 0: 743.3, 1: 742.7. Samples: 2845039. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:17:53,543][23569] Avg episode reward: [(0, '24.450'), (1, '27.950')] [2023-09-22 09:17:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11509760. Throughput: 0: 743.9, 1: 743.6. Samples: 2854264. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:17:58,543][23569] Avg episode reward: [(0, '24.960'), (1, '27.120')] [2023-09-22 09:18:03,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 11542528. Throughput: 0: 741.5, 1: 741.6. Samples: 2863043. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:18:03,543][23569] Avg episode reward: [(0, '25.010'), (1, '27.220')] [2023-09-22 09:18:03,543][24306] Saving new best policy, reward=25.010! [2023-09-22 09:18:05,895][24648] Updated weights for policy 1, policy_version 22400 (0.0016) [2023-09-22 09:18:05,895][24647] Updated weights for policy 0, policy_version 22744 (0.0017) [2023-09-22 09:18:08,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 11567104. Throughput: 0: 739.1, 1: 738.5. Samples: 2867249. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-22 09:18:08,543][23569] Avg episode reward: [(0, '25.120'), (1, '27.560')] [2023-09-22 09:18:08,605][24306] Saving new best policy, reward=25.120! [2023-09-22 09:18:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11599872. Throughput: 0: 742.3, 1: 739.4. Samples: 2876376. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:18:13,543][23569] Avg episode reward: [(0, '25.110'), (1, '27.370')] [2023-09-22 09:18:18,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 11632640. Throughput: 0: 746.5, 1: 747.9. Samples: 2885393. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:18:18,542][23569] Avg episode reward: [(0, '24.310'), (1, '28.730')] [2023-09-22 09:18:18,543][24495] Saving new best policy, reward=28.730! [2023-09-22 09:18:19,740][24647] Updated weights for policy 0, policy_version 22904 (0.0019) [2023-09-22 09:18:19,740][24648] Updated weights for policy 1, policy_version 22560 (0.0018) [2023-09-22 09:18:23,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11657216. Throughput: 0: 742.1, 1: 741.4. Samples: 2889686. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:18:23,542][23569] Avg episode reward: [(0, '24.870'), (1, '29.510')] [2023-09-22 09:18:23,543][24495] Saving new best policy, reward=29.510! [2023-09-22 09:18:28,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 11689984. Throughput: 0: 737.7, 1: 738.0. Samples: 2898310. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:18:28,543][23569] Avg episode reward: [(0, '24.610'), (1, '28.980')] [2023-09-22 09:18:33,484][24648] Updated weights for policy 1, policy_version 22720 (0.0016) [2023-09-22 09:18:33,485][24647] Updated weights for policy 0, policy_version 23064 (0.0014) [2023-09-22 09:18:33,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 11722752. Throughput: 0: 745.7, 1: 744.2. Samples: 2907621. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:18:33,543][23569] Avg episode reward: [(0, '24.560'), (1, '28.990')] [2023-09-22 09:18:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 11747328. Throughput: 0: 742.3, 1: 743.1. Samples: 2911883. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:18:38,543][23569] Avg episode reward: [(0, '24.620'), (1, '29.200')] [2023-09-22 09:18:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 11780096. Throughput: 0: 735.2, 1: 735.5. Samples: 2920448. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:18:43,543][23569] Avg episode reward: [(0, '25.170'), (1, '29.380')] [2023-09-22 09:18:43,555][24306] Saving new best policy, reward=25.170! [2023-09-22 09:18:47,704][24648] Updated weights for policy 1, policy_version 22880 (0.0017) [2023-09-22 09:18:47,704][24647] Updated weights for policy 0, policy_version 23224 (0.0017) [2023-09-22 09:18:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11804672. Throughput: 0: 732.7, 1: 733.4. Samples: 2929021. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:18:48,543][23569] Avg episode reward: [(0, '24.890'), (1, '29.660')] [2023-09-22 09:18:48,544][24495] Saving new best policy, reward=29.660! [2023-09-22 09:18:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11837440. Throughput: 0: 733.1, 1: 732.8. Samples: 2933215. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:18:53,543][23569] Avg episode reward: [(0, '24.730'), (1, '29.310')] [2023-09-22 09:18:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11862016. Throughput: 0: 731.6, 1: 733.2. Samples: 2942291. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:18:58,543][23569] Avg episode reward: [(0, '24.740'), (1, '29.040')] [2023-09-22 09:19:01,730][24648] Updated weights for policy 1, policy_version 23040 (0.0015) [2023-09-22 09:19:01,731][24647] Updated weights for policy 0, policy_version 23384 (0.0016) [2023-09-22 09:19:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11894784. Throughput: 0: 731.3, 1: 730.3. Samples: 2951168. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:19:03,543][23569] Avg episode reward: [(0, '25.100'), (1, '28.970')] [2023-09-22 09:19:08,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 11927552. Throughput: 0: 731.5, 1: 731.3. Samples: 2955512. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:19:08,543][23569] Avg episode reward: [(0, '25.070'), (1, '29.100')] [2023-09-22 09:19:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11952128. Throughput: 0: 735.4, 1: 736.2. Samples: 2964528. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:19:13,543][23569] Avg episode reward: [(0, '24.300'), (1, '28.980')] [2023-09-22 09:19:13,556][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000023168_5931008.pth... [2023-09-22 09:19:13,556][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000023512_6021120.pth... [2023-09-22 09:19:13,591][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000020760_5316608.pth [2023-09-22 09:19:13,595][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000020416_5226496.pth [2023-09-22 09:19:15,427][24647] Updated weights for policy 0, policy_version 23544 (0.0016) [2023-09-22 09:19:15,427][24648] Updated weights for policy 1, policy_version 23200 (0.0017) [2023-09-22 09:19:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 11984896. Throughput: 0: 731.8, 1: 735.0. Samples: 2973623. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:19:18,542][23569] Avg episode reward: [(0, '24.460'), (1, '28.430')] [2023-09-22 09:19:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12009472. Throughput: 0: 732.3, 1: 732.3. Samples: 2977792. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:19:23,543][23569] Avg episode reward: [(0, '24.870'), (1, '28.870')] [2023-09-22 09:19:28,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12042240. Throughput: 0: 732.9, 1: 732.6. Samples: 2986397. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:19:28,543][23569] Avg episode reward: [(0, '24.820'), (1, '28.750')] [2023-09-22 09:19:29,373][24647] Updated weights for policy 0, policy_version 23704 (0.0016) [2023-09-22 09:19:29,373][24648] Updated weights for policy 1, policy_version 23360 (0.0018) [2023-09-22 09:19:33,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12075008. Throughput: 0: 740.1, 1: 740.3. Samples: 2995637. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:19:33,543][23569] Avg episode reward: [(0, '24.750'), (1, '28.560')] [2023-09-22 09:19:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12099584. Throughput: 0: 743.9, 1: 743.3. Samples: 3000138. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:19:38,543][23569] Avg episode reward: [(0, '25.140'), (1, '28.210')] [2023-09-22 09:19:43,326][24648] Updated weights for policy 1, policy_version 23520 (0.0017) [2023-09-22 09:19:43,327][24647] Updated weights for policy 0, policy_version 23864 (0.0017) [2023-09-22 09:19:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 12132352. Throughput: 0: 736.4, 1: 735.3. Samples: 3008521. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:19:43,542][23569] Avg episode reward: [(0, '25.030'), (1, '28.250')] [2023-09-22 09:19:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12156928. Throughput: 0: 737.7, 1: 737.4. Samples: 3017548. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:19:48,543][23569] Avg episode reward: [(0, '25.050'), (1, '28.110')] [2023-09-22 09:19:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12189696. Throughput: 0: 737.0, 1: 738.1. Samples: 3021891. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:19:53,543][23569] Avg episode reward: [(0, '25.110'), (1, '28.380')] [2023-09-22 09:19:57,371][24647] Updated weights for policy 0, policy_version 24024 (0.0017) [2023-09-22 09:19:57,372][24648] Updated weights for policy 1, policy_version 23680 (0.0018) [2023-09-22 09:19:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 12214272. Throughput: 0: 735.9, 1: 734.5. Samples: 3030695. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:19:58,543][23569] Avg episode reward: [(0, '25.070'), (1, '28.340')] [2023-09-22 09:20:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12247040. Throughput: 0: 730.5, 1: 728.4. Samples: 3039272. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:20:03,543][23569] Avg episode reward: [(0, '24.390'), (1, '28.330')] [2023-09-22 09:20:08,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12279808. Throughput: 0: 735.0, 1: 734.8. Samples: 3043933. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:20:08,543][23569] Avg episode reward: [(0, '24.830'), (1, '28.600')] [2023-09-22 09:20:11,067][24648] Updated weights for policy 1, policy_version 23840 (0.0015) [2023-09-22 09:20:11,067][24647] Updated weights for policy 0, policy_version 24184 (0.0017) [2023-09-22 09:20:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12304384. Throughput: 0: 740.8, 1: 740.8. Samples: 3053065. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:20:13,543][23569] Avg episode reward: [(0, '24.710'), (1, '29.100')] [2023-09-22 09:20:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12337152. Throughput: 0: 734.7, 1: 734.7. Samples: 3061760. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:20:18,543][23569] Avg episode reward: [(0, '24.350'), (1, '28.680')] [2023-09-22 09:20:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12361728. Throughput: 0: 730.0, 1: 731.0. Samples: 3065885. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:20:23,543][23569] Avg episode reward: [(0, '24.580'), (1, '29.250')] [2023-09-22 09:20:25,103][24647] Updated weights for policy 0, policy_version 24344 (0.0018) [2023-09-22 09:20:25,103][24648] Updated weights for policy 1, policy_version 24000 (0.0018) [2023-09-22 09:20:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12394496. Throughput: 0: 738.8, 1: 737.5. Samples: 3074955. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:20:28,543][23569] Avg episode reward: [(0, '25.000'), (1, '29.610')] [2023-09-22 09:20:33,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 12427264. Throughput: 0: 737.9, 1: 739.3. Samples: 3084020. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:20:33,543][23569] Avg episode reward: [(0, '25.280'), (1, '29.750')] [2023-09-22 09:20:33,543][24495] Saving new best policy, reward=29.750! [2023-09-22 09:20:33,543][24306] Saving new best policy, reward=25.280! [2023-09-22 09:20:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12451840. Throughput: 0: 738.5, 1: 738.4. Samples: 3088350. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:20:38,543][23569] Avg episode reward: [(0, '25.850'), (1, '29.820')] [2023-09-22 09:20:38,544][24495] Saving new best policy, reward=29.820! [2023-09-22 09:20:38,544][24306] Saving new best policy, reward=25.850! [2023-09-22 09:20:39,088][24648] Updated weights for policy 1, policy_version 24160 (0.0017) [2023-09-22 09:20:39,089][24647] Updated weights for policy 0, policy_version 24504 (0.0018) [2023-09-22 09:20:43,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12484608. Throughput: 0: 734.6, 1: 735.4. Samples: 3096843. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:20:43,543][23569] Avg episode reward: [(0, '25.890'), (1, '28.880')] [2023-09-22 09:20:43,557][24306] Saving new best policy, reward=25.890! [2023-09-22 09:20:48,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5901.0). Total num frames: 12517376. Throughput: 0: 743.8, 1: 743.6. Samples: 3106205. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:20:48,543][23569] Avg episode reward: [(0, '25.860'), (1, '28.470')] [2023-09-22 09:20:52,814][24647] Updated weights for policy 0, policy_version 24664 (0.0016) [2023-09-22 09:20:52,814][24648] Updated weights for policy 1, policy_version 24320 (0.0017) [2023-09-22 09:20:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12541952. Throughput: 0: 740.8, 1: 741.0. Samples: 3110614. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:20:53,543][23569] Avg episode reward: [(0, '25.960'), (1, '29.750')] [2023-09-22 09:20:53,544][24306] Saving new best policy, reward=25.960! [2023-09-22 09:20:58,542][23569] Fps is (10 sec: 5324.8, 60 sec: 5939.2, 300 sec: 5873.2). Total num frames: 12570624. Throughput: 0: 733.6, 1: 733.6. Samples: 3119089. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:20:58,543][23569] Avg episode reward: [(0, '26.310'), (1, '28.590')] [2023-09-22 09:20:58,636][24306] Saving new best policy, reward=26.310! [2023-09-22 09:21:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 12599296. Throughput: 0: 728.3, 1: 728.3. Samples: 3127305. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:21:03,543][23569] Avg episode reward: [(0, '25.970'), (1, '28.010')] [2023-09-22 09:21:07,280][24648] Updated weights for policy 1, policy_version 24480 (0.0016) [2023-09-22 09:21:07,280][24647] Updated weights for policy 0, policy_version 24824 (0.0015) [2023-09-22 09:21:08,542][23569] Fps is (10 sec: 5324.8, 60 sec: 5734.4, 300 sec: 5873.3). Total num frames: 12623872. Throughput: 0: 727.9, 1: 728.2. Samples: 3131408. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:21:08,543][23569] Avg episode reward: [(0, '26.070'), (1, '27.430')] [2023-09-22 09:21:13,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12656640. Throughput: 0: 727.0, 1: 728.0. Samples: 3140432. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:21:13,543][23569] Avg episode reward: [(0, '25.860'), (1, '26.310')] [2023-09-22 09:21:13,555][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000024544_6283264.pth... [2023-09-22 09:21:13,555][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000024888_6373376.pth... [2023-09-22 09:21:13,588][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000021792_5578752.pth [2023-09-22 09:21:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000022136_5668864.pth [2023-09-22 09:21:18,542][23569] Fps is (10 sec: 6553.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12689408. Throughput: 0: 729.1, 1: 728.1. Samples: 3149594. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:21:18,544][23569] Avg episode reward: [(0, '26.460'), (1, '26.850')] [2023-09-22 09:21:18,545][24306] Saving new best policy, reward=26.460! [2023-09-22 09:21:21,063][24648] Updated weights for policy 1, policy_version 24640 (0.0018) [2023-09-22 09:21:21,063][24647] Updated weights for policy 0, policy_version 24984 (0.0017) [2023-09-22 09:21:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12713984. Throughput: 0: 728.5, 1: 728.6. Samples: 3153920. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:21:23,543][23569] Avg episode reward: [(0, '26.400'), (1, '26.670')] [2023-09-22 09:21:28,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12746752. Throughput: 0: 730.3, 1: 731.1. Samples: 3162608. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:21:28,543][23569] Avg episode reward: [(0, '26.460'), (1, '26.050')] [2023-09-22 09:21:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 12771328. Throughput: 0: 722.5, 1: 722.2. Samples: 3171215. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:21:33,543][23569] Avg episode reward: [(0, '26.290'), (1, '27.150')] [2023-09-22 09:21:35,099][24648] Updated weights for policy 1, policy_version 24800 (0.0017) [2023-09-22 09:21:35,099][24647] Updated weights for policy 0, policy_version 25144 (0.0017) [2023-09-22 09:21:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12804096. Throughput: 0: 725.4, 1: 724.6. Samples: 3175866. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:21:38,542][23569] Avg episode reward: [(0, '27.190'), (1, '26.930')] [2023-09-22 09:21:38,543][24306] Saving new best policy, reward=27.190! [2023-09-22 09:21:43,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12836864. Throughput: 0: 729.0, 1: 728.7. Samples: 3184686. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:21:43,543][23569] Avg episode reward: [(0, '26.650'), (1, '27.000')] [2023-09-22 09:21:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5887.1). Total num frames: 12861440. Throughput: 0: 740.2, 1: 739.7. Samples: 3193901. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:21:48,543][23569] Avg episode reward: [(0, '26.930'), (1, '27.920')] [2023-09-22 09:21:48,704][24647] Updated weights for policy 0, policy_version 25304 (0.0017) [2023-09-22 09:21:48,704][24648] Updated weights for policy 1, policy_version 24960 (0.0018) [2023-09-22 09:21:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12894208. Throughput: 0: 744.2, 1: 744.2. Samples: 3198384. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:21:53,543][23569] Avg episode reward: [(0, '26.190'), (1, '28.640')] [2023-09-22 09:21:58,542][23569] Fps is (10 sec: 6553.4, 60 sec: 5939.2, 300 sec: 5914.9). Total num frames: 12926976. Throughput: 0: 741.4, 1: 741.7. Samples: 3207172. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:21:58,544][23569] Avg episode reward: [(0, '26.010'), (1, '28.690')] [2023-09-22 09:22:02,499][24648] Updated weights for policy 1, policy_version 25120 (0.0014) [2023-09-22 09:22:02,500][24647] Updated weights for policy 0, policy_version 25464 (0.0018) [2023-09-22 09:22:03,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 12951552. Throughput: 0: 739.4, 1: 739.1. Samples: 3216128. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:03,542][23569] Avg episode reward: [(0, '25.030'), (1, '27.840')] [2023-09-22 09:22:08,542][23569] Fps is (10 sec: 5734.6, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 12984320. Throughput: 0: 740.0, 1: 739.3. Samples: 3220485. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:08,542][23569] Avg episode reward: [(0, '24.070'), (1, '28.330')] [2023-09-22 09:22:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 13008896. Throughput: 0: 738.0, 1: 738.1. Samples: 3229030. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:13,543][23569] Avg episode reward: [(0, '22.790'), (1, '28.030')] [2023-09-22 09:22:16,738][24648] Updated weights for policy 1, policy_version 25280 (0.0016) [2023-09-22 09:22:16,738][24647] Updated weights for policy 0, policy_version 25624 (0.0015) [2023-09-22 09:22:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 13041664. Throughput: 0: 740.3, 1: 740.8. Samples: 3237865. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:18,543][23569] Avg episode reward: [(0, '21.620'), (1, '27.840')] [2023-09-22 09:22:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13066240. Throughput: 0: 734.2, 1: 735.1. Samples: 3241985. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:23,543][23569] Avg episode reward: [(0, '20.430'), (1, '27.900')] [2023-09-22 09:22:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13099008. Throughput: 0: 737.6, 1: 737.9. Samples: 3251084. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:28,542][23569] Avg episode reward: [(0, '19.700'), (1, '26.620')] [2023-09-22 09:22:30,460][24647] Updated weights for policy 0, policy_version 25784 (0.0016) [2023-09-22 09:22:30,460][24648] Updated weights for policy 1, policy_version 25440 (0.0018) [2023-09-22 09:22:33,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 13131776. Throughput: 0: 737.9, 1: 737.4. Samples: 3260292. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:33,543][23569] Avg episode reward: [(0, '20.070'), (1, '26.770')] [2023-09-22 09:22:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13156352. Throughput: 0: 734.7, 1: 734.8. Samples: 3264514. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:22:38,543][23569] Avg episode reward: [(0, '19.910'), (1, '26.370')] [2023-09-22 09:22:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 13189120. Throughput: 0: 737.6, 1: 736.1. Samples: 3273487. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:22:43,542][23569] Avg episode reward: [(0, '20.440'), (1, '26.350')] [2023-09-22 09:22:44,242][24647] Updated weights for policy 0, policy_version 25944 (0.0018) [2023-09-22 09:22:44,242][24648] Updated weights for policy 1, policy_version 25600 (0.0018) [2023-09-22 09:22:48,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 13221888. Throughput: 0: 740.0, 1: 739.4. Samples: 3282701. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:22:48,543][23569] Avg episode reward: [(0, '21.530'), (1, '25.750')] [2023-09-22 09:22:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13246464. Throughput: 0: 739.2, 1: 739.8. Samples: 3287040. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:22:53,543][23569] Avg episode reward: [(0, '21.250'), (1, '25.790')] [2023-09-22 09:22:58,006][24648] Updated weights for policy 1, policy_version 25760 (0.0019) [2023-09-22 09:22:58,007][24647] Updated weights for policy 0, policy_version 26104 (0.0017) [2023-09-22 09:22:58,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 13279232. Throughput: 0: 741.6, 1: 740.5. Samples: 3295726. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:22:58,542][23569] Avg episode reward: [(0, '21.300'), (1, '26.060')] [2023-09-22 09:23:03,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 13312000. Throughput: 0: 744.8, 1: 744.3. Samples: 3304874. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:23:03,543][23569] Avg episode reward: [(0, '20.170'), (1, '26.320')] [2023-09-22 09:23:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13336576. Throughput: 0: 748.7, 1: 748.8. Samples: 3309371. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:23:08,542][23569] Avg episode reward: [(0, '20.330'), (1, '26.650')] [2023-09-22 09:23:12,006][24648] Updated weights for policy 1, policy_version 25920 (0.0018) [2023-09-22 09:23:12,007][24647] Updated weights for policy 0, policy_version 26264 (0.0016) [2023-09-22 09:23:13,542][23569] Fps is (10 sec: 5734.2, 60 sec: 6007.4, 300 sec: 5887.1). Total num frames: 13369344. Throughput: 0: 740.7, 1: 741.0. Samples: 3317761. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:23:13,544][23569] Avg episode reward: [(0, '21.060'), (1, '26.870')] [2023-09-22 09:23:13,555][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000025936_6639616.pth... [2023-09-22 09:23:13,555][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000026280_6729728.pth... [2023-09-22 09:23:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000023512_6021120.pth [2023-09-22 09:23:13,591][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000023168_5931008.pth [2023-09-22 09:23:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13393920. Throughput: 0: 734.1, 1: 734.5. Samples: 3326378. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:18,543][23569] Avg episode reward: [(0, '20.900'), (1, '26.420')] [2023-09-22 09:23:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 13426688. Throughput: 0: 738.8, 1: 738.7. Samples: 3331004. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:23,543][23569] Avg episode reward: [(0, '21.790'), (1, '26.840')] [2023-09-22 09:23:25,828][24647] Updated weights for policy 0, policy_version 26424 (0.0018) [2023-09-22 09:23:25,828][24648] Updated weights for policy 1, policy_version 26080 (0.0018) [2023-09-22 09:23:28,542][23569] Fps is (10 sec: 6143.9, 60 sec: 5939.2, 300 sec: 5873.2). Total num frames: 13455360. Throughput: 0: 740.1, 1: 741.8. Samples: 3340176. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:28,543][23569] Avg episode reward: [(0, '21.350'), (1, '27.330')] [2023-09-22 09:23:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 13484032. Throughput: 0: 731.4, 1: 731.4. Samples: 3348527. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:33,542][23569] Avg episode reward: [(0, '22.400'), (1, '27.730')] [2023-09-22 09:23:38,542][23569] Fps is (10 sec: 6144.0, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 13516800. Throughput: 0: 734.8, 1: 734.8. Samples: 3353176. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:38,543][23569] Avg episode reward: [(0, '22.550'), (1, '26.880')] [2023-09-22 09:23:39,812][24647] Updated weights for policy 0, policy_version 26584 (0.0017) [2023-09-22 09:23:39,813][24648] Updated weights for policy 1, policy_version 26240 (0.0017) [2023-09-22 09:23:43,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13541376. Throughput: 0: 739.7, 1: 738.5. Samples: 3362244. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:43,543][23569] Avg episode reward: [(0, '23.230'), (1, '26.960')] [2023-09-22 09:23:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13574144. Throughput: 0: 734.4, 1: 735.4. Samples: 3371017. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:48,543][23569] Avg episode reward: [(0, '23.180'), (1, '26.720')] [2023-09-22 09:23:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13598720. Throughput: 0: 731.7, 1: 730.9. Samples: 3375188. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:23:53,543][23569] Avg episode reward: [(0, '23.180'), (1, '27.060')] [2023-09-22 09:23:53,673][24648] Updated weights for policy 1, policy_version 26400 (0.0017) [2023-09-22 09:23:53,673][24647] Updated weights for policy 0, policy_version 26744 (0.0016) [2023-09-22 09:23:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13631488. Throughput: 0: 738.7, 1: 738.0. Samples: 3384213. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:23:58,543][23569] Avg episode reward: [(0, '23.250'), (1, '27.390')] [2023-09-22 09:24:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 13656064. Throughput: 0: 737.6, 1: 737.4. Samples: 3392754. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:24:03,543][23569] Avg episode reward: [(0, '23.830'), (1, '26.570')] [2023-09-22 09:24:07,798][24648] Updated weights for policy 1, policy_version 26560 (0.0016) [2023-09-22 09:24:07,798][24647] Updated weights for policy 0, policy_version 26904 (0.0017) [2023-09-22 09:24:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13688832. Throughput: 0: 738.0, 1: 736.2. Samples: 3397345. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:24:08,542][23569] Avg episode reward: [(0, '24.330'), (1, '26.250')] [2023-09-22 09:24:13,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 13721600. Throughput: 0: 729.5, 1: 729.4. Samples: 3405824. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:24:13,542][23569] Avg episode reward: [(0, '24.420'), (1, '26.860')] [2023-09-22 09:24:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13746176. Throughput: 0: 734.8, 1: 736.3. Samples: 3414728. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:24:18,543][23569] Avg episode reward: [(0, '24.850'), (1, '26.060')] [2023-09-22 09:24:21,683][24647] Updated weights for policy 0, policy_version 27064 (0.0015) [2023-09-22 09:24:21,683][24648] Updated weights for policy 1, policy_version 26720 (0.0015) [2023-09-22 09:24:23,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13778944. Throughput: 0: 734.7, 1: 733.4. Samples: 3419241. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:24:23,543][23569] Avg episode reward: [(0, '24.770'), (1, '26.560')] [2023-09-22 09:24:28,542][23569] Fps is (10 sec: 6553.5, 60 sec: 5939.2, 300 sec: 5887.1). Total num frames: 13811712. Throughput: 0: 733.6, 1: 735.6. Samples: 3428357. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-22 09:24:28,543][23569] Avg episode reward: [(0, '25.620'), (1, '26.290')] [2023-09-22 09:24:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13836288. Throughput: 0: 730.9, 1: 730.1. Samples: 3436761. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:24:33,544][23569] Avg episode reward: [(0, '25.360'), (1, '26.520')] [2023-09-22 09:24:35,450][24648] Updated weights for policy 1, policy_version 26880 (0.0019) [2023-09-22 09:24:35,451][24647] Updated weights for policy 0, policy_version 27224 (0.0018) [2023-09-22 09:24:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13869056. Throughput: 0: 735.9, 1: 736.0. Samples: 3441422. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:24:38,543][23569] Avg episode reward: [(0, '25.190'), (1, '26.840')] [2023-09-22 09:24:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13893632. Throughput: 0: 736.5, 1: 734.7. Samples: 3450415. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:24:43,543][23569] Avg episode reward: [(0, '24.140'), (1, '27.320')] [2023-09-22 09:24:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13926400. Throughput: 0: 736.4, 1: 737.3. Samples: 3459072. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:24:48,543][23569] Avg episode reward: [(0, '24.500'), (1, '27.280')] [2023-09-22 09:24:49,593][24647] Updated weights for policy 0, policy_version 27384 (0.0017) [2023-09-22 09:24:49,593][24648] Updated weights for policy 1, policy_version 27040 (0.0016) [2023-09-22 09:24:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13950976. Throughput: 0: 730.4, 1: 732.4. Samples: 3463171. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:24:53,543][23569] Avg episode reward: [(0, '24.510'), (1, '26.920')] [2023-09-22 09:24:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 13983744. Throughput: 0: 738.8, 1: 737.5. Samples: 3472258. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:24:58,542][23569] Avg episode reward: [(0, '24.260'), (1, '27.500')] [2023-09-22 09:25:03,320][24648] Updated weights for policy 1, policy_version 27200 (0.0019) [2023-09-22 09:25:03,320][24647] Updated weights for policy 0, policy_version 27544 (0.0018) [2023-09-22 09:25:03,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 14016512. Throughput: 0: 741.0, 1: 741.4. Samples: 3481439. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:25:03,543][23569] Avg episode reward: [(0, '23.710'), (1, '26.890')] [2023-09-22 09:25:08,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14041088. Throughput: 0: 737.7, 1: 737.6. Samples: 3485630. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:25:08,543][23569] Avg episode reward: [(0, '23.840'), (1, '26.520')] [2023-09-22 09:25:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14073856. Throughput: 0: 729.5, 1: 728.9. Samples: 3493987. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:25:13,543][23569] Avg episode reward: [(0, '24.220'), (1, '26.130')] [2023-09-22 09:25:13,552][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000027312_6991872.pth... [2023-09-22 09:25:13,553][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000027656_7081984.pth... [2023-09-22 09:25:13,582][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000024888_6373376.pth [2023-09-22 09:25:13,594][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000024544_6283264.pth [2023-09-22 09:25:17,378][24647] Updated weights for policy 0, policy_version 27704 (0.0017) [2023-09-22 09:25:17,378][24648] Updated weights for policy 1, policy_version 27360 (0.0016) [2023-09-22 09:25:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14098432. Throughput: 0: 737.1, 1: 737.7. Samples: 3503127. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:25:18,543][23569] Avg episode reward: [(0, '24.620'), (1, '26.050')] [2023-09-22 09:25:23,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 14131200. Throughput: 0: 735.1, 1: 735.5. Samples: 3507597. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:25:23,543][23569] Avg episode reward: [(0, '25.540'), (1, '25.220')] [2023-09-22 09:25:28,542][23569] Fps is (10 sec: 6553.8, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14163968. Throughput: 0: 732.1, 1: 734.6. Samples: 3516416. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:25:28,543][23569] Avg episode reward: [(0, '25.090'), (1, '25.870')] [2023-09-22 09:25:31,254][24647] Updated weights for policy 0, policy_version 27864 (0.0016) [2023-09-22 09:25:31,254][24648] Updated weights for policy 1, policy_version 27520 (0.0017) [2023-09-22 09:25:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 14188544. Throughput: 0: 734.4, 1: 733.4. Samples: 3525122. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:25:33,543][23569] Avg episode reward: [(0, '25.900'), (1, '26.280')] [2023-09-22 09:25:38,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14221312. Throughput: 0: 742.3, 1: 741.5. Samples: 3529941. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:25:38,544][23569] Avg episode reward: [(0, '25.510'), (1, '26.950')] [2023-09-22 09:25:43,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 14254080. Throughput: 0: 740.3, 1: 741.6. Samples: 3538944. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-22 09:25:43,543][23569] Avg episode reward: [(0, '26.140'), (1, '27.280')] [2023-09-22 09:25:44,816][24648] Updated weights for policy 1, policy_version 27680 (0.0018) [2023-09-22 09:25:44,816][24647] Updated weights for policy 0, policy_version 28024 (0.0017) [2023-09-22 09:25:48,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14278656. Throughput: 0: 740.5, 1: 738.4. Samples: 3547990. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:25:48,543][23569] Avg episode reward: [(0, '26.160'), (1, '27.580')] [2023-09-22 09:25:53,542][23569] Fps is (10 sec: 5734.6, 60 sec: 6007.5, 300 sec: 5901.0). Total num frames: 14311424. Throughput: 0: 743.0, 1: 743.9. Samples: 3552540. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:25:53,542][23569] Avg episode reward: [(0, '26.420'), (1, '27.810')] [2023-09-22 09:25:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14336000. Throughput: 0: 749.4, 1: 750.2. Samples: 3561467. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:25:58,543][23569] Avg episode reward: [(0, '26.490'), (1, '28.830')] [2023-09-22 09:25:58,579][24647] Updated weights for policy 0, policy_version 28184 (0.0016) [2023-09-22 09:25:58,580][24648] Updated weights for policy 1, policy_version 27840 (0.0018) [2023-09-22 09:26:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 14368768. Throughput: 0: 739.9, 1: 739.6. Samples: 3569701. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:03,543][23569] Avg episode reward: [(0, '26.910'), (1, '28.830')] [2023-09-22 09:26:08,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 14401536. Throughput: 0: 742.1, 1: 742.0. Samples: 3574379. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:08,542][23569] Avg episode reward: [(0, '26.860'), (1, '28.110')] [2023-09-22 09:26:12,812][24648] Updated weights for policy 1, policy_version 28000 (0.0016) [2023-09-22 09:26:12,812][24647] Updated weights for policy 0, policy_version 28344 (0.0019) [2023-09-22 09:26:13,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14426112. Throughput: 0: 739.8, 1: 738.6. Samples: 3582947. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:13,543][23569] Avg episode reward: [(0, '27.460'), (1, '28.380')] [2023-09-22 09:26:13,557][24306] Saving new best policy, reward=27.460! [2023-09-22 09:26:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 14458880. Throughput: 0: 743.3, 1: 744.8. Samples: 3592089. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:18,543][23569] Avg episode reward: [(0, '27.850'), (1, '28.150')] [2023-09-22 09:26:18,543][24306] Saving new best policy, reward=27.850! [2023-09-22 09:26:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14483456. Throughput: 0: 736.8, 1: 737.6. Samples: 3596288. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:23,543][23569] Avg episode reward: [(0, '28.040'), (1, '28.970')] [2023-09-22 09:26:23,544][24306] Saving new best policy, reward=28.040! [2023-09-22 09:26:26,545][24647] Updated weights for policy 0, policy_version 28504 (0.0018) [2023-09-22 09:26:26,545][24648] Updated weights for policy 1, policy_version 28160 (0.0016) [2023-09-22 09:26:28,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 14516224. Throughput: 0: 736.9, 1: 736.1. Samples: 3605229. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:28,544][23569] Avg episode reward: [(0, '28.160'), (1, '28.640')] [2023-09-22 09:26:28,553][24306] Saving new best policy, reward=28.160! [2023-09-22 09:26:33,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 14548992. Throughput: 0: 736.3, 1: 735.1. Samples: 3614204. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:33,542][23569] Avg episode reward: [(0, '27.900'), (1, '28.900')] [2023-09-22 09:26:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14573568. Throughput: 0: 734.4, 1: 734.6. Samples: 3618646. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:38,543][23569] Avg episode reward: [(0, '28.410'), (1, '29.340')] [2023-09-22 09:26:38,545][24306] Saving new best policy, reward=28.410! [2023-09-22 09:26:40,567][24648] Updated weights for policy 1, policy_version 28320 (0.0017) [2023-09-22 09:26:40,568][24647] Updated weights for policy 0, policy_version 28664 (0.0014) [2023-09-22 09:26:43,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 14606336. Throughput: 0: 728.3, 1: 728.2. Samples: 3627009. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:26:43,543][23569] Avg episode reward: [(0, '28.310'), (1, '29.170')] [2023-09-22 09:26:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14630912. Throughput: 0: 732.8, 1: 732.7. Samples: 3635645. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:26:48,543][23569] Avg episode reward: [(0, '28.300'), (1, '29.790')] [2023-09-22 09:26:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14663680. Throughput: 0: 732.7, 1: 732.2. Samples: 3640301. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:26:53,543][23569] Avg episode reward: [(0, '28.210'), (1, '29.610')] [2023-09-22 09:26:54,478][24647] Updated weights for policy 0, policy_version 28824 (0.0018) [2023-09-22 09:26:54,478][24648] Updated weights for policy 1, policy_version 28480 (0.0017) [2023-09-22 09:26:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14688256. Throughput: 0: 739.1, 1: 738.4. Samples: 3649434. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:26:58,543][23569] Avg episode reward: [(0, '28.080'), (1, '28.630')] [2023-09-22 09:27:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14721024. Throughput: 0: 733.8, 1: 733.1. Samples: 3658101. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:27:03,543][23569] Avg episode reward: [(0, '28.390'), (1, '28.870')] [2023-09-22 09:27:08,122][24648] Updated weights for policy 1, policy_version 28640 (0.0016) [2023-09-22 09:27:08,122][24647] Updated weights for policy 0, policy_version 28984 (0.0018) [2023-09-22 09:27:08,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 14753792. Throughput: 0: 739.8, 1: 739.5. Samples: 3662855. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:27:08,543][23569] Avg episode reward: [(0, '28.280'), (1, '29.130')] [2023-09-22 09:27:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14778368. Throughput: 0: 738.6, 1: 739.4. Samples: 3671739. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:27:13,543][23569] Avg episode reward: [(0, '28.220'), (1, '29.340')] [2023-09-22 09:27:13,641][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000028704_7348224.pth... [2023-09-22 09:27:13,644][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000029048_7438336.pth... [2023-09-22 09:27:13,669][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000025936_6639616.pth [2023-09-22 09:27:13,672][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000026280_6729728.pth [2023-09-22 09:27:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 14811136. Throughput: 0: 738.1, 1: 739.0. Samples: 3680676. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:27:18,543][23569] Avg episode reward: [(0, '28.620'), (1, '28.620')] [2023-09-22 09:27:18,544][24306] Saving new best policy, reward=28.620! [2023-09-22 09:27:21,752][24647] Updated weights for policy 0, policy_version 29144 (0.0018) [2023-09-22 09:27:21,753][24648] Updated weights for policy 1, policy_version 28800 (0.0019) [2023-09-22 09:27:23,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 14843904. Throughput: 0: 741.2, 1: 741.1. Samples: 3685349. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:27:23,543][23569] Avg episode reward: [(0, '28.210'), (1, '28.000')] [2023-09-22 09:27:28,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 14876672. Throughput: 0: 750.6, 1: 749.9. Samples: 3694533. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:27:28,544][23569] Avg episode reward: [(0, '29.140'), (1, '28.770')] [2023-09-22 09:27:28,552][24306] Saving new best policy, reward=29.140! [2023-09-22 09:27:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 14901248. Throughput: 0: 749.9, 1: 750.2. Samples: 3703147. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:27:33,543][23569] Avg episode reward: [(0, '28.950'), (1, '28.190')] [2023-09-22 09:27:35,456][24647] Updated weights for policy 0, policy_version 29304 (0.0015) [2023-09-22 09:27:35,456][24648] Updated weights for policy 1, policy_version 28960 (0.0016) [2023-09-22 09:27:38,542][23569] Fps is (10 sec: 5734.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 14934016. Throughput: 0: 748.6, 1: 749.1. Samples: 3707696. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:27:38,542][23569] Avg episode reward: [(0, '28.830'), (1, '28.330')] [2023-09-22 09:27:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 14958592. Throughput: 0: 742.0, 1: 744.3. Samples: 3716319. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:27:43,543][23569] Avg episode reward: [(0, '28.670'), (1, '29.160')] [2023-09-22 09:27:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 14991360. Throughput: 0: 746.7, 1: 746.0. Samples: 3725272. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:27:48,543][23569] Avg episode reward: [(0, '28.690'), (1, '28.270')] [2023-09-22 09:27:49,560][24647] Updated weights for policy 0, policy_version 29464 (0.0019) [2023-09-22 09:27:49,560][24648] Updated weights for policy 1, policy_version 29120 (0.0018) [2023-09-22 09:27:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 15015936. Throughput: 0: 740.9, 1: 740.6. Samples: 3729522. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:27:53,544][23569] Avg episode reward: [(0, '28.920'), (1, '29.070')] [2023-09-22 09:27:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 15048704. Throughput: 0: 745.4, 1: 743.6. Samples: 3738746. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:27:58,543][23569] Avg episode reward: [(0, '28.860'), (1, '29.420')] [2023-09-22 09:28:03,047][24648] Updated weights for policy 1, policy_version 29280 (0.0015) [2023-09-22 09:28:03,048][24647] Updated weights for policy 0, policy_version 29624 (0.0017) [2023-09-22 09:28:03,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15081472. Throughput: 0: 745.7, 1: 746.9. Samples: 3747840. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:28:03,543][23569] Avg episode reward: [(0, '28.850'), (1, '29.520')] [2023-09-22 09:28:08,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15114240. Throughput: 0: 743.0, 1: 742.6. Samples: 3752203. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:28:08,542][23569] Avg episode reward: [(0, '28.990'), (1, '29.070')] [2023-09-22 09:28:13,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15138816. Throughput: 0: 745.9, 1: 745.7. Samples: 3761657. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:28:13,543][23569] Avg episode reward: [(0, '29.230'), (1, '29.840')] [2023-09-22 09:28:13,731][24495] Saving new best policy, reward=29.840! [2023-09-22 09:28:13,743][24306] Saving new best policy, reward=29.230! [2023-09-22 09:28:16,504][24647] Updated weights for policy 0, policy_version 29784 (0.0018) [2023-09-22 09:28:16,504][24648] Updated weights for policy 1, policy_version 29440 (0.0019) [2023-09-22 09:28:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15171584. Throughput: 0: 747.0, 1: 747.2. Samples: 3770386. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-22 09:28:18,543][23569] Avg episode reward: [(0, '29.140'), (1, '29.160')] [2023-09-22 09:28:23,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5928.8). Total num frames: 15204352. Throughput: 0: 747.8, 1: 747.5. Samples: 3774981. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:28:23,543][23569] Avg episode reward: [(0, '29.310'), (1, '29.580')] [2023-09-22 09:28:23,544][24306] Saving new best policy, reward=29.310! [2023-09-22 09:28:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 15228928. Throughput: 0: 754.9, 1: 752.6. Samples: 3784158. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:28:28,543][23569] Avg episode reward: [(0, '28.590'), (1, '29.420')] [2023-09-22 09:28:30,078][24648] Updated weights for policy 1, policy_version 29600 (0.0017) [2023-09-22 09:28:30,078][24647] Updated weights for policy 0, policy_version 29944 (0.0017) [2023-09-22 09:28:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15261696. Throughput: 0: 752.1, 1: 752.3. Samples: 3792970. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:28:33,543][23569] Avg episode reward: [(0, '28.470'), (1, '29.470')] [2023-09-22 09:28:38,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5942.7). Total num frames: 15294464. Throughput: 0: 754.6, 1: 755.0. Samples: 3797451. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:28:38,542][23569] Avg episode reward: [(0, '29.030'), (1, '29.050')] [2023-09-22 09:28:43,542][23569] Fps is (10 sec: 5734.2, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15319040. Throughput: 0: 752.8, 1: 753.8. Samples: 3806542. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:28:43,543][23569] Avg episode reward: [(0, '28.730'), (1, '29.730')] [2023-09-22 09:28:43,964][24648] Updated weights for policy 1, policy_version 29760 (0.0018) [2023-09-22 09:28:43,964][24647] Updated weights for policy 0, policy_version 30104 (0.0017) [2023-09-22 09:28:48,542][23569] Fps is (10 sec: 5734.2, 60 sec: 6007.5, 300 sec: 5942.7). Total num frames: 15351808. Throughput: 0: 750.8, 1: 750.6. Samples: 3815403. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:28:48,544][23569] Avg episode reward: [(0, '28.740'), (1, '29.210')] [2023-09-22 09:28:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15376384. Throughput: 0: 747.5, 1: 748.4. Samples: 3819520. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:28:53,543][23569] Avg episode reward: [(0, '28.490'), (1, '29.570')] [2023-09-22 09:28:57,961][24648] Updated weights for policy 1, policy_version 29920 (0.0017) [2023-09-22 09:28:57,961][24647] Updated weights for policy 0, policy_version 30264 (0.0018) [2023-09-22 09:28:58,542][23569] Fps is (10 sec: 5734.6, 60 sec: 6007.5, 300 sec: 5942.7). Total num frames: 15409152. Throughput: 0: 739.3, 1: 739.7. Samples: 3828212. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:28:58,542][23569] Avg episode reward: [(0, '29.140'), (1, '28.830')] [2023-09-22 09:29:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15433728. Throughput: 0: 741.3, 1: 740.4. Samples: 3837063. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:29:03,543][23569] Avg episode reward: [(0, '29.050'), (1, '28.920')] [2023-09-22 09:29:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15466496. Throughput: 0: 742.7, 1: 742.6. Samples: 3841819. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:08,543][23569] Avg episode reward: [(0, '29.210'), (1, '28.160')] [2023-09-22 09:29:11,642][24648] Updated weights for policy 1, policy_version 30080 (0.0017) [2023-09-22 09:29:11,643][24647] Updated weights for policy 0, policy_version 30424 (0.0018) [2023-09-22 09:29:13,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.4, 300 sec: 5942.7). Total num frames: 15499264. Throughput: 0: 739.3, 1: 740.6. Samples: 3850753. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:13,543][23569] Avg episode reward: [(0, '29.920'), (1, '28.400')] [2023-09-22 09:29:13,554][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000030096_7704576.pth... [2023-09-22 09:29:13,554][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000030440_7794688.pth... [2023-09-22 09:29:13,589][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000027312_6991872.pth [2023-09-22 09:29:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000027656_7081984.pth [2023-09-22 09:29:13,594][24306] Saving new best policy, reward=29.920! [2023-09-22 09:29:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15523840. Throughput: 0: 736.0, 1: 735.2. Samples: 3859172. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:18,543][23569] Avg episode reward: [(0, '29.480'), (1, '28.550')] [2023-09-22 09:29:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15556608. Throughput: 0: 733.9, 1: 734.1. Samples: 3863513. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:23,543][23569] Avg episode reward: [(0, '29.830'), (1, '29.120')] [2023-09-22 09:29:25,971][24647] Updated weights for policy 0, policy_version 30584 (0.0018) [2023-09-22 09:29:25,972][24648] Updated weights for policy 1, policy_version 30240 (0.0018) [2023-09-22 09:29:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15581184. Throughput: 0: 732.9, 1: 733.1. Samples: 3872511. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:28,543][23569] Avg episode reward: [(0, '29.320'), (1, '29.340')] [2023-09-22 09:29:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15613952. Throughput: 0: 728.4, 1: 728.6. Samples: 3880965. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:33,543][23569] Avg episode reward: [(0, '29.780'), (1, '28.680')] [2023-09-22 09:29:38,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5914.9). Total num frames: 15638528. Throughput: 0: 728.5, 1: 728.3. Samples: 3885075. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:38,543][23569] Avg episode reward: [(0, '29.690'), (1, '29.130')] [2023-09-22 09:29:40,087][24648] Updated weights for policy 1, policy_version 30400 (0.0018) [2023-09-22 09:29:40,087][24647] Updated weights for policy 0, policy_version 30744 (0.0017) [2023-09-22 09:29:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 15671296. Throughput: 0: 732.6, 1: 731.8. Samples: 3894106. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:29:43,542][23569] Avg episode reward: [(0, '30.000'), (1, '29.250')] [2023-09-22 09:29:43,552][24306] Saving new best policy, reward=30.000! [2023-09-22 09:29:48,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5871.0, 300 sec: 5942.7). Total num frames: 15704064. Throughput: 0: 734.5, 1: 735.6. Samples: 3903218. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:29:48,543][23569] Avg episode reward: [(0, '30.540'), (1, '28.900')] [2023-09-22 09:29:48,543][24306] Saving new best policy, reward=30.540! [2023-09-22 09:29:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15728640. Throughput: 0: 730.3, 1: 730.8. Samples: 3907567. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:29:53,543][23569] Avg episode reward: [(0, '29.830'), (1, '29.420')] [2023-09-22 09:29:54,016][24647] Updated weights for policy 0, policy_version 30904 (0.0018) [2023-09-22 09:29:54,016][24648] Updated weights for policy 1, policy_version 30560 (0.0018) [2023-09-22 09:29:58,543][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15761408. Throughput: 0: 726.3, 1: 727.4. Samples: 3916172. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:29:58,544][23569] Avg episode reward: [(0, '29.700'), (1, '29.040')] [2023-09-22 09:30:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15785984. Throughput: 0: 733.4, 1: 735.1. Samples: 3925255. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:30:03,543][23569] Avg episode reward: [(0, '29.540'), (1, '28.900')] [2023-09-22 09:30:07,753][24648] Updated weights for policy 1, policy_version 30720 (0.0016) [2023-09-22 09:30:07,753][24647] Updated weights for policy 0, policy_version 31064 (0.0017) [2023-09-22 09:30:08,542][23569] Fps is (10 sec: 5734.7, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15818752. Throughput: 0: 736.3, 1: 735.4. Samples: 3929740. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:30:08,542][23569] Avg episode reward: [(0, '29.460'), (1, '28.920')] [2023-09-22 09:30:13,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5871.0, 300 sec: 5942.7). Total num frames: 15851520. Throughput: 0: 732.5, 1: 732.0. Samples: 3938415. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:30:13,543][23569] Avg episode reward: [(0, '29.540'), (1, '29.130')] [2023-09-22 09:30:18,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15876096. Throughput: 0: 732.9, 1: 733.4. Samples: 3946949. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:30:18,544][23569] Avg episode reward: [(0, '29.380'), (1, '29.980')] [2023-09-22 09:30:18,545][24495] Saving new best policy, reward=29.980! [2023-09-22 09:30:21,793][24647] Updated weights for policy 0, policy_version 31224 (0.0015) [2023-09-22 09:30:21,794][24648] Updated weights for policy 1, policy_version 30880 (0.0016) [2023-09-22 09:30:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15908864. Throughput: 0: 737.5, 1: 737.5. Samples: 3951450. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:30:23,543][23569] Avg episode reward: [(0, '28.680'), (1, '29.500')] [2023-09-22 09:30:28,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15933440. Throughput: 0: 737.4, 1: 738.5. Samples: 3960521. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:30:28,544][23569] Avg episode reward: [(0, '29.100'), (1, '29.280')] [2023-09-22 09:30:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 15966208. Throughput: 0: 733.1, 1: 732.2. Samples: 3969155. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:30:33,543][23569] Avg episode reward: [(0, '29.260'), (1, '29.340')] [2023-09-22 09:30:35,510][24647] Updated weights for policy 0, policy_version 31384 (0.0016) [2023-09-22 09:30:35,510][24648] Updated weights for policy 1, policy_version 31040 (0.0016) [2023-09-22 09:30:38,542][23569] Fps is (10 sec: 6553.9, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 15998976. Throughput: 0: 736.9, 1: 735.8. Samples: 3973842. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:30:38,542][23569] Avg episode reward: [(0, '29.490'), (1, '29.290')] [2023-09-22 09:30:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16023552. Throughput: 0: 741.4, 1: 739.9. Samples: 3982832. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:30:43,543][23569] Avg episode reward: [(0, '29.170'), (1, '28.800')] [2023-09-22 09:30:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16056320. Throughput: 0: 736.7, 1: 736.6. Samples: 3991554. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-22 09:30:48,543][23569] Avg episode reward: [(0, '29.200'), (1, '28.950')] [2023-09-22 09:30:49,524][24647] Updated weights for policy 0, policy_version 31544 (0.0017) [2023-09-22 09:30:49,524][24648] Updated weights for policy 1, policy_version 31200 (0.0018) [2023-09-22 09:30:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16080896. Throughput: 0: 732.5, 1: 732.9. Samples: 3995685. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:30:53,543][23569] Avg episode reward: [(0, '29.310'), (1, '27.670')] [2023-09-22 09:30:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 16113664. Throughput: 0: 734.0, 1: 734.5. Samples: 4004498. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:30:58,543][23569] Avg episode reward: [(0, '29.540'), (1, '28.200')] [2023-09-22 09:31:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 16138240. Throughput: 0: 734.2, 1: 733.4. Samples: 4012994. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:03,543][23569] Avg episode reward: [(0, '29.230'), (1, '28.210')] [2023-09-22 09:31:03,821][24648] Updated weights for policy 1, policy_version 31360 (0.0017) [2023-09-22 09:31:03,821][24647] Updated weights for policy 0, policy_version 31704 (0.0017) [2023-09-22 09:31:08,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16171008. Throughput: 0: 732.7, 1: 733.2. Samples: 4017418. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:08,544][23569] Avg episode reward: [(0, '30.140'), (1, '28.050')] [2023-09-22 09:31:13,542][23569] Fps is (10 sec: 6553.6, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16203776. Throughput: 0: 731.5, 1: 731.8. Samples: 4026368. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:31:13,543][23569] Avg episode reward: [(0, '30.320'), (1, '28.320')] [2023-09-22 09:31:13,554][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000031472_8056832.pth... [2023-09-22 09:31:13,554][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000031816_8146944.pth... [2023-09-22 09:31:13,589][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000029048_7438336.pth [2023-09-22 09:31:13,592][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000028704_7348224.pth [2023-09-22 09:31:17,544][24648] Updated weights for policy 1, policy_version 31520 (0.0017) [2023-09-22 09:31:17,544][24647] Updated weights for policy 0, policy_version 31864 (0.0018) [2023-09-22 09:31:18,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 16228352. Throughput: 0: 734.7, 1: 734.6. Samples: 4035275. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:31:18,543][23569] Avg episode reward: [(0, '30.230'), (1, '28.950')] [2023-09-22 09:31:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16261120. Throughput: 0: 732.1, 1: 732.7. Samples: 4039758. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:31:23,543][23569] Avg episode reward: [(0, '30.200'), (1, '28.230')] [2023-09-22 09:31:28,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 16293888. Throughput: 0: 733.5, 1: 734.6. Samples: 4048896. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:31:28,543][23569] Avg episode reward: [(0, '30.110'), (1, '28.510')] [2023-09-22 09:31:31,302][24648] Updated weights for policy 1, policy_version 31680 (0.0018) [2023-09-22 09:31:31,303][24647] Updated weights for policy 0, policy_version 32024 (0.0014) [2023-09-22 09:31:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16318464. Throughput: 0: 731.1, 1: 730.8. Samples: 4057339. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:33,542][23569] Avg episode reward: [(0, '29.790'), (1, '28.930')] [2023-09-22 09:31:38,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16351232. Throughput: 0: 734.5, 1: 734.6. Samples: 4061792. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:38,543][23569] Avg episode reward: [(0, '30.290'), (1, '28.890')] [2023-09-22 09:31:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16375808. Throughput: 0: 740.4, 1: 739.9. Samples: 4071110. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:43,543][23569] Avg episode reward: [(0, '30.200'), (1, '29.340')] [2023-09-22 09:31:45,111][24648] Updated weights for policy 1, policy_version 31840 (0.0013) [2023-09-22 09:31:45,111][24647] Updated weights for policy 0, policy_version 32184 (0.0015) [2023-09-22 09:31:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16408576. Throughput: 0: 742.8, 1: 742.5. Samples: 4079832. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:48,543][23569] Avg episode reward: [(0, '30.420'), (1, '28.750')] [2023-09-22 09:31:53,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5942.7). Total num frames: 16441344. Throughput: 0: 742.6, 1: 741.5. Samples: 4084202. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:53,542][23569] Avg episode reward: [(0, '29.690'), (1, '29.270')] [2023-09-22 09:31:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16465920. Throughput: 0: 744.5, 1: 744.8. Samples: 4093385. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:31:58,543][23569] Avg episode reward: [(0, '30.300'), (1, '29.990')] [2023-09-22 09:31:58,551][24495] Saving new best policy, reward=29.990! [2023-09-22 09:31:58,964][24647] Updated weights for policy 0, policy_version 32344 (0.0018) [2023-09-22 09:31:58,965][24648] Updated weights for policy 1, policy_version 32000 (0.0019) [2023-09-22 09:32:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 16498688. Throughput: 0: 741.3, 1: 741.1. Samples: 4101982. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:03,543][23569] Avg episode reward: [(0, '30.030'), (1, '30.030')] [2023-09-22 09:32:03,544][24495] Saving new best policy, reward=30.030! [2023-09-22 09:32:08,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16523264. Throughput: 0: 738.3, 1: 739.0. Samples: 4106240. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:08,544][23569] Avg episode reward: [(0, '30.090'), (1, '29.800')] [2023-09-22 09:32:12,885][24648] Updated weights for policy 1, policy_version 32160 (0.0016) [2023-09-22 09:32:12,886][24647] Updated weights for policy 0, policy_version 32504 (0.0018) [2023-09-22 09:32:13,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16556032. Throughput: 0: 737.4, 1: 736.4. Samples: 4115217. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:13,543][23569] Avg episode reward: [(0, '30.410'), (1, '30.540')] [2023-09-22 09:32:13,553][24495] Saving new best policy, reward=30.540! [2023-09-22 09:32:18,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5939.2, 300 sec: 5901.0). Total num frames: 16584704. Throughput: 0: 740.5, 1: 741.1. Samples: 4124009. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:18,543][23569] Avg episode reward: [(0, '30.080'), (1, '30.370')] [2023-09-22 09:32:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 16613376. Throughput: 0: 738.6, 1: 740.3. Samples: 4128341. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:23,542][23569] Avg episode reward: [(0, '30.090'), (1, '31.020')] [2023-09-22 09:32:23,543][24495] Saving new best policy, reward=31.020! [2023-09-22 09:32:27,035][24647] Updated weights for policy 0, policy_version 32664 (0.0017) [2023-09-22 09:32:27,035][24648] Updated weights for policy 1, policy_version 32320 (0.0016) [2023-09-22 09:32:28,542][23569] Fps is (10 sec: 6144.1, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16646144. Throughput: 0: 731.2, 1: 732.1. Samples: 4136960. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:28,543][23569] Avg episode reward: [(0, '30.000'), (1, '30.560')] [2023-09-22 09:32:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 16670720. Throughput: 0: 732.5, 1: 732.0. Samples: 4145736. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:33,543][23569] Avg episode reward: [(0, '29.540'), (1, '29.460')] [2023-09-22 09:32:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16703488. Throughput: 0: 736.1, 1: 736.6. Samples: 4150473. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:38,543][23569] Avg episode reward: [(0, '29.750'), (1, '30.610')] [2023-09-22 09:32:40,679][24647] Updated weights for policy 0, policy_version 32824 (0.0016) [2023-09-22 09:32:40,679][24648] Updated weights for policy 1, policy_version 32480 (0.0016) [2023-09-22 09:32:43,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 16736256. Throughput: 0: 734.7, 1: 734.3. Samples: 4159490. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:43,543][23569] Avg episode reward: [(0, '29.030'), (1, '30.940')] [2023-09-22 09:32:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16760832. Throughput: 0: 735.9, 1: 735.9. Samples: 4168212. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:48,543][23569] Avg episode reward: [(0, '29.600'), (1, '31.330')] [2023-09-22 09:32:48,543][24495] Saving new best policy, reward=31.330! [2023-09-22 09:32:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 16793600. Throughput: 0: 740.4, 1: 738.0. Samples: 4172768. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:53,543][23569] Avg episode reward: [(0, '29.300'), (1, '30.390')] [2023-09-22 09:32:54,434][24647] Updated weights for policy 0, policy_version 32984 (0.0017) [2023-09-22 09:32:54,436][24648] Updated weights for policy 1, policy_version 32640 (0.0019) [2023-09-22 09:32:58,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 16818176. Throughput: 0: 739.3, 1: 740.9. Samples: 4181826. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:32:58,544][23569] Avg episode reward: [(0, '29.140'), (1, '30.860')] [2023-09-22 09:33:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 16850944. Throughput: 0: 735.7, 1: 735.4. Samples: 4190208. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:33:03,543][23569] Avg episode reward: [(0, '29.850'), (1, '30.800')] [2023-09-22 09:33:08,472][24648] Updated weights for policy 1, policy_version 32800 (0.0016) [2023-09-22 09:33:08,473][24647] Updated weights for policy 0, policy_version 33144 (0.0018) [2023-09-22 09:33:08,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 16883712. Throughput: 0: 735.9, 1: 734.1. Samples: 4194492. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:33:08,543][23569] Avg episode reward: [(0, '29.940'), (1, '30.410')] [2023-09-22 09:33:13,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 16908288. Throughput: 0: 741.2, 1: 740.2. Samples: 4203622. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:33:13,543][23569] Avg episode reward: [(0, '30.390'), (1, '30.870')] [2023-09-22 09:33:13,552][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000033192_8499200.pth... [2023-09-22 09:33:13,552][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000032848_8409088.pth... [2023-09-22 09:33:13,592][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000030440_7794688.pth [2023-09-22 09:33:13,593][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000030096_7704576.pth [2023-09-22 09:33:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5939.2, 300 sec: 5887.1). Total num frames: 16941056. Throughput: 0: 743.8, 1: 745.0. Samples: 4212733. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:33:18,543][23569] Avg episode reward: [(0, '30.730'), (1, '30.210')] [2023-09-22 09:33:18,543][24306] Saving new best policy, reward=30.730! [2023-09-22 09:33:22,361][24648] Updated weights for policy 1, policy_version 32960 (0.0016) [2023-09-22 09:33:22,361][24647] Updated weights for policy 0, policy_version 33304 (0.0017) [2023-09-22 09:33:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 16965632. Throughput: 0: 737.7, 1: 737.5. Samples: 4216857. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-22 09:33:23,543][23569] Avg episode reward: [(0, '29.900'), (1, '30.870')] [2023-09-22 09:33:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 16998400. Throughput: 0: 736.1, 1: 735.0. Samples: 4225688. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:33:28,543][23569] Avg episode reward: [(0, '29.430'), (1, '29.590')] [2023-09-22 09:33:33,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5887.1). Total num frames: 17031168. Throughput: 0: 739.7, 1: 740.7. Samples: 4234829. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:33:33,543][23569] Avg episode reward: [(0, '29.830'), (1, '29.950')] [2023-09-22 09:33:36,139][24648] Updated weights for policy 1, policy_version 33120 (0.0017) [2023-09-22 09:33:36,139][24647] Updated weights for policy 0, policy_version 33464 (0.0015) [2023-09-22 09:33:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17055744. Throughput: 0: 738.7, 1: 741.1. Samples: 4239360. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:33:38,543][23569] Avg episode reward: [(0, '29.920'), (1, '29.710')] [2023-09-22 09:33:43,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17088512. Throughput: 0: 734.7, 1: 733.9. Samples: 4247911. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:33:43,543][23569] Avg episode reward: [(0, '29.600'), (1, '29.720')] [2023-09-22 09:33:48,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17121280. Throughput: 0: 744.8, 1: 745.5. Samples: 4257272. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:33:48,543][23569] Avg episode reward: [(0, '29.810'), (1, '29.500')] [2023-09-22 09:33:49,939][24647] Updated weights for policy 0, policy_version 33624 (0.0017) [2023-09-22 09:33:49,939][24648] Updated weights for policy 1, policy_version 33280 (0.0015) [2023-09-22 09:33:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17145856. Throughput: 0: 747.2, 1: 744.6. Samples: 4261621. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:33:53,543][23569] Avg episode reward: [(0, '29.500'), (1, '28.940')] [2023-09-22 09:33:58,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17178624. Throughput: 0: 738.3, 1: 739.0. Samples: 4270099. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:33:58,543][23569] Avg episode reward: [(0, '29.350'), (1, '29.500')] [2023-09-22 09:34:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17203200. Throughput: 0: 737.2, 1: 736.8. Samples: 4279060. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:34:03,543][23569] Avg episode reward: [(0, '29.590'), (1, '29.360')] [2023-09-22 09:34:03,863][24647] Updated weights for policy 0, policy_version 33784 (0.0014) [2023-09-22 09:34:03,863][24648] Updated weights for policy 1, policy_version 33440 (0.0016) [2023-09-22 09:34:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 17235968. Throughput: 0: 742.3, 1: 741.7. Samples: 4283636. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:34:08,542][23569] Avg episode reward: [(0, '29.670'), (1, '30.480')] [2023-09-22 09:34:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17260544. Throughput: 0: 741.4, 1: 742.3. Samples: 4292452. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:34:13,543][23569] Avg episode reward: [(0, '30.230'), (1, '29.980')] [2023-09-22 09:34:17,760][24647] Updated weights for policy 0, policy_version 33944 (0.0018) [2023-09-22 09:34:17,760][24648] Updated weights for policy 1, policy_version 33600 (0.0016) [2023-09-22 09:34:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17293312. Throughput: 0: 736.7, 1: 736.3. Samples: 4301114. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:34:18,543][23569] Avg episode reward: [(0, '29.740'), (1, '30.140')] [2023-09-22 09:34:23,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17326080. Throughput: 0: 736.7, 1: 736.4. Samples: 4305647. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:34:23,542][23569] Avg episode reward: [(0, '29.340'), (1, '29.980')] [2023-09-22 09:34:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17350656. Throughput: 0: 742.0, 1: 743.2. Samples: 4314747. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-22 09:34:28,542][23569] Avg episode reward: [(0, '29.890'), (1, '29.780')] [2023-09-22 09:34:31,479][24647] Updated weights for policy 0, policy_version 34104 (0.0017) [2023-09-22 09:34:31,479][24648] Updated weights for policy 1, policy_version 33760 (0.0017) [2023-09-22 09:34:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 17383424. Throughput: 0: 734.9, 1: 733.7. Samples: 4323358. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:34:33,543][23569] Avg episode reward: [(0, '30.670'), (1, '29.450')] [2023-09-22 09:34:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17408000. Throughput: 0: 731.7, 1: 734.0. Samples: 4327576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:34:38,543][23569] Avg episode reward: [(0, '30.130'), (1, '31.180')] [2023-09-22 09:34:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17440768. Throughput: 0: 736.0, 1: 735.6. Samples: 4336322. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:34:43,543][23569] Avg episode reward: [(0, '30.350'), (1, '30.750')] [2023-09-22 09:34:45,630][24648] Updated weights for policy 1, policy_version 33920 (0.0019) [2023-09-22 09:34:45,630][24647] Updated weights for policy 0, policy_version 34264 (0.0019) [2023-09-22 09:34:48,542][23569] Fps is (10 sec: 6553.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 17473536. Throughput: 0: 734.9, 1: 735.2. Samples: 4345213. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:34:48,544][23569] Avg episode reward: [(0, '30.020'), (1, '30.900')] [2023-09-22 09:34:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17498112. Throughput: 0: 734.5, 1: 734.0. Samples: 4349718. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:34:53,543][23569] Avg episode reward: [(0, '30.100'), (1, '29.790')] [2023-09-22 09:34:58,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 17530880. Throughput: 0: 730.6, 1: 730.2. Samples: 4358187. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:34:58,542][23569] Avg episode reward: [(0, '30.070'), (1, '30.200')] [2023-09-22 09:34:59,570][24647] Updated weights for policy 0, policy_version 34424 (0.0015) [2023-09-22 09:34:59,570][24648] Updated weights for policy 1, policy_version 34080 (0.0015) [2023-09-22 09:35:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17555456. Throughput: 0: 739.8, 1: 739.0. Samples: 4367658. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:03,543][23569] Avg episode reward: [(0, '29.260'), (1, '30.010')] [2023-09-22 09:35:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17588224. Throughput: 0: 740.7, 1: 740.9. Samples: 4372319. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:08,543][23569] Avg episode reward: [(0, '29.640'), (1, '30.150')] [2023-09-22 09:35:13,004][24647] Updated weights for policy 0, policy_version 34584 (0.0014) [2023-09-22 09:35:13,004][24648] Updated weights for policy 1, policy_version 34240 (0.0016) [2023-09-22 09:35:13,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17620992. Throughput: 0: 739.7, 1: 737.8. Samples: 4381235. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:13,543][23569] Avg episode reward: [(0, '29.450'), (1, '29.810')] [2023-09-22 09:35:13,552][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000034240_8765440.pth... [2023-09-22 09:35:13,552][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000034584_8855552.pth... [2023-09-22 09:35:13,587][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000031472_8056832.pth [2023-09-22 09:35:13,589][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000031816_8146944.pth [2023-09-22 09:35:18,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17653760. Throughput: 0: 745.2, 1: 745.2. Samples: 4390426. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:18,543][23569] Avg episode reward: [(0, '29.330'), (1, '29.780')] [2023-09-22 09:35:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 17678336. Throughput: 0: 748.8, 1: 749.0. Samples: 4394977. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:23,543][23569] Avg episode reward: [(0, '29.310'), (1, '29.860')] [2023-09-22 09:35:26,810][24647] Updated weights for policy 0, policy_version 34744 (0.0017) [2023-09-22 09:35:26,810][24648] Updated weights for policy 1, policy_version 34400 (0.0018) [2023-09-22 09:35:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17711104. Throughput: 0: 745.7, 1: 745.4. Samples: 4403422. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:28,543][23569] Avg episode reward: [(0, '29.260'), (1, '29.990')] [2023-09-22 09:35:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 17735680. Throughput: 0: 747.3, 1: 746.7. Samples: 4412444. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:33,543][23569] Avg episode reward: [(0, '28.910'), (1, '29.780')] [2023-09-22 09:35:38,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17768448. Throughput: 0: 748.4, 1: 748.1. Samples: 4417062. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:35:38,543][23569] Avg episode reward: [(0, '29.640'), (1, '29.350')] [2023-09-22 09:35:40,619][24647] Updated weights for policy 0, policy_version 34904 (0.0017) [2023-09-22 09:35:40,619][24648] Updated weights for policy 1, policy_version 34560 (0.0017) [2023-09-22 09:35:43,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17801216. Throughput: 0: 750.2, 1: 750.7. Samples: 4425728. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:35:43,543][23569] Avg episode reward: [(0, '29.550'), (1, '29.190')] [2023-09-22 09:35:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 17825792. Throughput: 0: 745.3, 1: 745.8. Samples: 4434757. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:35:48,543][23569] Avg episode reward: [(0, '29.740'), (1, '30.040')] [2023-09-22 09:35:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17858560. Throughput: 0: 742.0, 1: 742.3. Samples: 4439111. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:35:53,543][23569] Avg episode reward: [(0, '30.070'), (1, '30.730')] [2023-09-22 09:35:54,438][24647] Updated weights for policy 0, policy_version 35064 (0.0017) [2023-09-22 09:35:54,438][24648] Updated weights for policy 1, policy_version 34720 (0.0015) [2023-09-22 09:35:58,542][23569] Fps is (10 sec: 6553.8, 60 sec: 6007.5, 300 sec: 5942.7). Total num frames: 17891328. Throughput: 0: 743.7, 1: 744.5. Samples: 4448202. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-22 09:35:58,542][23569] Avg episode reward: [(0, '29.610'), (1, '30.220')] [2023-09-22 09:36:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17915904. Throughput: 0: 739.2, 1: 738.7. Samples: 4456934. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:36:03,543][23569] Avg episode reward: [(0, '29.300'), (1, '30.610')] [2023-09-22 09:36:08,125][24647] Updated weights for policy 0, policy_version 35224 (0.0015) [2023-09-22 09:36:08,125][24648] Updated weights for policy 1, policy_version 34880 (0.0016) [2023-09-22 09:36:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 17948672. Throughput: 0: 740.3, 1: 740.7. Samples: 4461622. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:36:08,543][23569] Avg episode reward: [(0, '30.120'), (1, '30.150')] [2023-09-22 09:36:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 17973248. Throughput: 0: 746.6, 1: 746.1. Samples: 4470592. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:36:13,543][23569] Avg episode reward: [(0, '29.770'), (1, '29.790')] [2023-09-22 09:36:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18006016. Throughput: 0: 742.4, 1: 742.7. Samples: 4479274. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:36:18,543][23569] Avg episode reward: [(0, '29.470'), (1, '29.700')] [2023-09-22 09:36:22,025][24647] Updated weights for policy 0, policy_version 35384 (0.0018) [2023-09-22 09:36:22,025][24648] Updated weights for policy 1, policy_version 35040 (0.0016) [2023-09-22 09:36:23,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 18038784. Throughput: 0: 739.1, 1: 740.0. Samples: 4483623. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:36:23,543][23569] Avg episode reward: [(0, '29.930'), (1, '30.200')] [2023-09-22 09:36:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18063360. Throughput: 0: 743.7, 1: 745.1. Samples: 4492725. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:36:28,542][23569] Avg episode reward: [(0, '30.360'), (1, '29.670')] [2023-09-22 09:36:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 18096128. Throughput: 0: 737.8, 1: 738.2. Samples: 4501178. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:36:33,543][23569] Avg episode reward: [(0, '29.660'), (1, '28.990')] [2023-09-22 09:36:36,074][24647] Updated weights for policy 0, policy_version 35544 (0.0017) [2023-09-22 09:36:36,074][24648] Updated weights for policy 1, policy_version 35200 (0.0016) [2023-09-22 09:36:38,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18120704. Throughput: 0: 738.9, 1: 738.7. Samples: 4505600. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:36:38,543][23569] Avg episode reward: [(0, '29.910'), (1, '29.810')] [2023-09-22 09:36:43,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18153472. Throughput: 0: 734.4, 1: 733.7. Samples: 4514267. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:36:43,543][23569] Avg episode reward: [(0, '29.890'), (1, '28.700')] [2023-09-22 09:36:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5887.1). Total num frames: 18178048. Throughput: 0: 735.4, 1: 735.7. Samples: 4523134. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-22 09:36:48,543][23569] Avg episode reward: [(0, '30.140'), (1, '29.540')] [2023-09-22 09:36:50,120][24648] Updated weights for policy 1, policy_version 35360 (0.0017) [2023-09-22 09:36:50,120][24647] Updated weights for policy 0, policy_version 35704 (0.0015) [2023-09-22 09:36:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18210816. Throughput: 0: 733.3, 1: 730.7. Samples: 4527500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:36:53,543][23569] Avg episode reward: [(0, '30.730'), (1, '28.820')] [2023-09-22 09:36:58,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18243584. Throughput: 0: 729.7, 1: 731.0. Samples: 4536320. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:36:58,543][23569] Avg episode reward: [(0, '30.510'), (1, '29.960')] [2023-09-22 09:37:03,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18268160. Throughput: 0: 733.1, 1: 732.7. Samples: 4545234. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:37:03,543][23569] Avg episode reward: [(0, '29.950'), (1, '30.540')] [2023-09-22 09:37:03,920][24647] Updated weights for policy 0, policy_version 35864 (0.0016) [2023-09-22 09:37:03,921][24648] Updated weights for policy 1, policy_version 35520 (0.0016) [2023-09-22 09:37:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18300928. Throughput: 0: 735.8, 1: 735.4. Samples: 4549826. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:37:08,543][23569] Avg episode reward: [(0, '29.860'), (1, '30.490')] [2023-09-22 09:37:13,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.4, 300 sec: 5928.8). Total num frames: 18333696. Throughput: 0: 735.4, 1: 734.0. Samples: 4558848. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:37:13,543][23569] Avg episode reward: [(0, '30.160'), (1, '29.730')] [2023-09-22 09:37:13,555][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000035976_9211904.pth... [2023-09-22 09:37:13,555][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000035632_9121792.pth... [2023-09-22 09:37:13,584][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000033192_8499200.pth [2023-09-22 09:37:13,590][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000032848_8409088.pth [2023-09-22 09:37:17,485][24648] Updated weights for policy 1, policy_version 35680 (0.0016) [2023-09-22 09:37:17,486][24647] Updated weights for policy 0, policy_version 36024 (0.0016) [2023-09-22 09:37:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18358272. Throughput: 0: 741.4, 1: 741.3. Samples: 4567900. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:37:18,543][23569] Avg episode reward: [(0, '29.470'), (1, '29.380')] [2023-09-22 09:37:23,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18391040. Throughput: 0: 740.4, 1: 740.6. Samples: 4572243. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:37:23,543][23569] Avg episode reward: [(0, '29.340'), (1, '29.850')] [2023-09-22 09:37:28,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18415616. Throughput: 0: 743.1, 1: 744.3. Samples: 4581198. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:37:28,543][23569] Avg episode reward: [(0, '28.740'), (1, '29.450')] [2023-09-22 09:37:31,431][24648] Updated weights for policy 1, policy_version 35840 (0.0016) [2023-09-22 09:37:31,432][24647] Updated weights for policy 0, policy_version 36184 (0.0017) [2023-09-22 09:37:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5914.9). Total num frames: 18448384. Throughput: 0: 740.9, 1: 741.0. Samples: 4589819. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-22 09:37:33,544][23569] Avg episode reward: [(0, '30.230'), (1, '29.320')] [2023-09-22 09:37:38,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 18481152. Throughput: 0: 739.5, 1: 742.5. Samples: 4594187. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:37:38,543][23569] Avg episode reward: [(0, '29.390'), (1, '30.120')] [2023-09-22 09:37:43,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5871.0, 300 sec: 5914.9). Total num frames: 18505728. Throughput: 0: 739.7, 1: 740.3. Samples: 4602918. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:37:43,543][23569] Avg episode reward: [(0, '30.160'), (1, '29.150')] [2023-09-22 09:37:45,947][24647] Updated weights for policy 0, policy_version 36344 (0.0015) [2023-09-22 09:37:45,948][24648] Updated weights for policy 1, policy_version 36000 (0.0017) [2023-09-22 09:37:48,542][23569] Fps is (10 sec: 4915.2, 60 sec: 5871.0, 300 sec: 5887.1). Total num frames: 18530304. Throughput: 0: 727.5, 1: 727.8. Samples: 4610724. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:37:48,543][23569] Avg episode reward: [(0, '29.900'), (1, '30.240')] [2023-09-22 09:37:53,542][23569] Fps is (10 sec: 4915.2, 60 sec: 5734.4, 300 sec: 5887.1). Total num frames: 18554880. Throughput: 0: 719.3, 1: 720.0. Samples: 4614593. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:37:53,543][23569] Avg episode reward: [(0, '30.140'), (1, '30.110')] [2023-09-22 09:37:58,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5734.4, 300 sec: 5887.1). Total num frames: 18587648. Throughput: 0: 709.5, 1: 709.0. Samples: 4622678. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:37:58,544][23569] Avg episode reward: [(0, '30.980'), (1, '29.030')] [2023-09-22 09:37:58,551][24306] Saving new best policy, reward=30.980! [2023-09-22 09:38:00,933][24648] Updated weights for policy 1, policy_version 36160 (0.0011) [2023-09-22 09:38:00,934][24647] Updated weights for policy 0, policy_version 36504 (0.0014) [2023-09-22 09:38:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 18612224. Throughput: 0: 709.4, 1: 709.2. Samples: 4631737. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:03,543][23569] Avg episode reward: [(0, '31.170'), (1, '29.660')] [2023-09-22 09:38:03,705][24306] Saving new best policy, reward=31.170! [2023-09-22 09:38:08,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5734.4, 300 sec: 5887.1). Total num frames: 18644992. Throughput: 0: 704.8, 1: 703.5. Samples: 4635614. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:08,542][23569] Avg episode reward: [(0, '31.230'), (1, '29.010')] [2023-09-22 09:38:08,543][24306] Saving new best policy, reward=31.230! [2023-09-22 09:38:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5597.9, 300 sec: 5859.4). Total num frames: 18669568. Throughput: 0: 693.2, 1: 691.5. Samples: 4643510. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:13,543][23569] Avg episode reward: [(0, '30.540'), (1, '29.210')] [2023-09-22 09:38:16,100][24647] Updated weights for policy 0, policy_version 36664 (0.0013) [2023-09-22 09:38:16,101][24648] Updated weights for policy 1, policy_version 36320 (0.0011) [2023-09-22 09:38:18,542][23569] Fps is (10 sec: 4915.1, 60 sec: 5597.9, 300 sec: 5859.4). Total num frames: 18694144. Throughput: 0: 686.1, 1: 686.6. Samples: 4651594. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:18,543][23569] Avg episode reward: [(0, '30.890'), (1, '28.960')] [2023-09-22 09:38:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5597.9, 300 sec: 5859.4). Total num frames: 18726912. Throughput: 0: 685.2, 1: 685.7. Samples: 4655879. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:23,542][23569] Avg episode reward: [(0, '30.800'), (1, '28.330')] [2023-09-22 09:38:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5597.9, 300 sec: 5831.6). Total num frames: 18751488. Throughput: 0: 684.6, 1: 684.9. Samples: 4664545. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:28,543][23569] Avg episode reward: [(0, '30.600'), (1, '28.050')] [2023-09-22 09:38:30,489][24648] Updated weights for policy 1, policy_version 36480 (0.0018) [2023-09-22 09:38:30,489][24647] Updated weights for policy 0, policy_version 36824 (0.0018) [2023-09-22 09:38:33,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5597.9, 300 sec: 5859.4). Total num frames: 18784256. Throughput: 0: 696.5, 1: 697.5. Samples: 4673454. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:33,543][23569] Avg episode reward: [(0, '30.790'), (1, '27.760')] [2023-09-22 09:38:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5461.3, 300 sec: 5831.6). Total num frames: 18808832. Throughput: 0: 700.3, 1: 700.6. Samples: 4677633. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:38,543][23569] Avg episode reward: [(0, '30.600'), (1, '27.370')] [2023-09-22 09:38:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5597.9, 300 sec: 5831.6). Total num frames: 18841600. Throughput: 0: 706.1, 1: 706.6. Samples: 4686250. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:43,543][23569] Avg episode reward: [(0, '30.790'), (1, '27.330')] [2023-09-22 09:38:44,563][24647] Updated weights for policy 0, policy_version 36984 (0.0015) [2023-09-22 09:38:44,564][24648] Updated weights for policy 1, policy_version 36640 (0.0017) [2023-09-22 09:38:48,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5597.9, 300 sec: 5831.6). Total num frames: 18866176. Throughput: 0: 702.2, 1: 701.6. Samples: 4694906. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:38:48,542][23569] Avg episode reward: [(0, '31.110'), (1, '28.210')] [2023-09-22 09:38:53,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 18898944. Throughput: 0: 708.6, 1: 708.1. Samples: 4699364. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 09:38:53,543][23569] Avg episode reward: [(0, '31.020'), (1, '27.480')] [2023-09-22 09:38:58,406][24647] Updated weights for policy 0, policy_version 37144 (0.0015) [2023-09-22 09:38:58,406][24648] Updated weights for policy 1, policy_version 36800 (0.0016) [2023-09-22 09:38:58,542][23569] Fps is (10 sec: 6553.4, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 18931712. Throughput: 0: 719.9, 1: 721.0. Samples: 4708352. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 09:38:58,543][23569] Avg episode reward: [(0, '31.060'), (1, '28.280')] [2023-09-22 09:39:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5831.6). Total num frames: 18956288. Throughput: 0: 726.9, 1: 727.1. Samples: 4717024. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 09:39:03,543][23569] Avg episode reward: [(0, '30.200'), (1, '27.950')] [2023-09-22 09:39:08,542][23569] Fps is (10 sec: 5734.6, 60 sec: 5734.4, 300 sec: 5859.4). Total num frames: 18989056. Throughput: 0: 731.5, 1: 730.3. Samples: 4721660. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-22 09:39:08,543][23569] Avg episode reward: [(0, '30.060'), (1, '29.280')] [2023-09-22 09:39:12,145][24647] Updated weights for policy 0, policy_version 37304 (0.0016) [2023-09-22 09:39:12,145][24648] Updated weights for policy 1, policy_version 36960 (0.0017) [2023-09-22 09:39:13,542][23569] Fps is (10 sec: 6553.8, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 19021824. Throughput: 0: 737.5, 1: 736.4. Samples: 4730869. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:13,542][23569] Avg episode reward: [(0, '28.970'), (1, '29.230')] [2023-09-22 09:39:13,551][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000037320_9555968.pth... [2023-09-22 09:39:13,552][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000036976_9465856.pth... [2023-09-22 09:39:13,587][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000034240_8765440.pth [2023-09-22 09:39:13,590][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000034584_8855552.pth [2023-09-22 09:39:18,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19046400. Throughput: 0: 737.0, 1: 735.3. Samples: 4739707. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:18,543][23569] Avg episode reward: [(0, '29.490'), (1, '28.260')] [2023-09-22 09:39:23,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 19079168. Throughput: 0: 741.5, 1: 741.4. Samples: 4744360. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:23,543][23569] Avg episode reward: [(0, '29.670'), (1, '28.160')] [2023-09-22 09:39:25,628][24648] Updated weights for policy 1, policy_version 37120 (0.0015) [2023-09-22 09:39:25,629][24647] Updated weights for policy 0, policy_version 37464 (0.0016) [2023-09-22 09:39:28,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 19111936. Throughput: 0: 746.3, 1: 746.2. Samples: 4753413. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:28,543][23569] Avg episode reward: [(0, '29.070'), (1, '27.620')] [2023-09-22 09:39:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 19136512. Throughput: 0: 750.3, 1: 751.0. Samples: 4762462. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:33,543][23569] Avg episode reward: [(0, '30.530'), (1, '25.850')] [2023-09-22 09:39:38,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 19169280. Throughput: 0: 751.1, 1: 751.4. Samples: 4766978. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:38,543][23569] Avg episode reward: [(0, '30.520'), (1, '26.110')] [2023-09-22 09:39:39,439][24648] Updated weights for policy 1, policy_version 37280 (0.0017) [2023-09-22 09:39:39,439][24647] Updated weights for policy 0, policy_version 37624 (0.0015) [2023-09-22 09:39:43,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19193856. Throughput: 0: 746.1, 1: 747.4. Samples: 4775560. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:43,542][23569] Avg episode reward: [(0, '30.380'), (1, '26.520')] [2023-09-22 09:39:48,542][23569] Fps is (10 sec: 5734.3, 60 sec: 6007.4, 300 sec: 5859.4). Total num frames: 19226624. Throughput: 0: 745.7, 1: 745.5. Samples: 4784128. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:48,543][23569] Avg episode reward: [(0, '30.770'), (1, '26.340')] [2023-09-22 09:39:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19251200. Throughput: 0: 739.3, 1: 739.9. Samples: 4788225. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:53,543][23569] Avg episode reward: [(0, '30.240'), (1, '27.280')] [2023-09-22 09:39:53,821][24647] Updated weights for policy 0, policy_version 37784 (0.0014) [2023-09-22 09:39:53,821][24648] Updated weights for policy 1, policy_version 37440 (0.0016) [2023-09-22 09:39:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 19283968. Throughput: 0: 735.5, 1: 735.6. Samples: 4797072. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:39:58,544][23569] Avg episode reward: [(0, '30.630'), (1, '26.650')] [2023-09-22 09:40:03,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 19316736. Throughput: 0: 741.0, 1: 741.2. Samples: 4806407. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:03,543][23569] Avg episode reward: [(0, '30.660'), (1, '26.740')] [2023-09-22 09:40:07,471][24647] Updated weights for policy 0, policy_version 37944 (0.0015) [2023-09-22 09:40:07,473][24648] Updated weights for policy 1, policy_version 37600 (0.0017) [2023-09-22 09:40:08,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19341312. Throughput: 0: 737.6, 1: 737.8. Samples: 4810752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:08,544][23569] Avg episode reward: [(0, '29.400'), (1, '27.380')] [2023-09-22 09:40:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19374080. Throughput: 0: 735.5, 1: 733.8. Samples: 4819531. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:13,543][23569] Avg episode reward: [(0, '29.350'), (1, '26.860')] [2023-09-22 09:40:18,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19398656. Throughput: 0: 733.4, 1: 732.2. Samples: 4828412. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:18,544][23569] Avg episode reward: [(0, '29.770'), (1, '27.410')] [2023-09-22 09:40:21,267][24647] Updated weights for policy 0, policy_version 38104 (0.0017) [2023-09-22 09:40:21,267][24648] Updated weights for policy 1, policy_version 37760 (0.0017) [2023-09-22 09:40:23,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19431424. Throughput: 0: 734.6, 1: 734.9. Samples: 4833105. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:23,543][23569] Avg episode reward: [(0, '29.690'), (1, '26.650')] [2023-09-22 09:40:28,542][23569] Fps is (10 sec: 6553.7, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 19464192. Throughput: 0: 736.9, 1: 734.7. Samples: 4841783. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:28,543][23569] Avg episode reward: [(0, '30.960'), (1, '26.620')] [2023-09-22 09:40:33,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19488768. Throughput: 0: 738.9, 1: 738.1. Samples: 4850592. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:33,542][23569] Avg episode reward: [(0, '30.560'), (1, '26.660')] [2023-09-22 09:40:35,108][24648] Updated weights for policy 1, policy_version 37920 (0.0017) [2023-09-22 09:40:35,109][24647] Updated weights for policy 0, policy_version 38264 (0.0016) [2023-09-22 09:40:38,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19521536. Throughput: 0: 744.7, 1: 744.9. Samples: 4855259. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:38,543][23569] Avg episode reward: [(0, '30.600'), (1, '27.130')] [2023-09-22 09:40:43,542][23569] Fps is (10 sec: 6553.5, 60 sec: 6007.4, 300 sec: 5859.4). Total num frames: 19554304. Throughput: 0: 743.6, 1: 743.7. Samples: 4864000. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:43,543][23569] Avg episode reward: [(0, '30.380'), (1, '28.400')] [2023-09-22 09:40:48,542][23569] Fps is (10 sec: 5734.5, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 19578880. Throughput: 0: 740.7, 1: 741.0. Samples: 4873084. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-22 09:40:48,542][23569] Avg episode reward: [(0, '30.120'), (1, '28.820')] [2023-09-22 09:40:48,766][24648] Updated weights for policy 1, policy_version 38080 (0.0016) [2023-09-22 09:40:48,767][24647] Updated weights for policy 0, policy_version 38424 (0.0018) [2023-09-22 09:40:53,542][23569] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 5831.6). Total num frames: 19611648. Throughput: 0: 743.0, 1: 743.2. Samples: 4877632. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:40:53,543][23569] Avg episode reward: [(0, '29.770'), (1, '28.740')] [2023-09-22 09:40:58,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5871.0, 300 sec: 5831.6). Total num frames: 19636224. Throughput: 0: 743.0, 1: 743.4. Samples: 4886415. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:40:58,543][23569] Avg episode reward: [(0, '30.520'), (1, '28.440')] [2023-09-22 09:41:02,706][24648] Updated weights for policy 1, policy_version 38240 (0.0018) [2023-09-22 09:41:02,706][24647] Updated weights for policy 0, policy_version 38584 (0.0016) [2023-09-22 09:41:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19668992. Throughput: 0: 741.0, 1: 741.8. Samples: 4895140. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:03,543][23569] Avg episode reward: [(0, '29.990'), (1, '28.850')] [2023-09-22 09:41:08,542][23569] Fps is (10 sec: 6553.6, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 19701760. Throughput: 0: 740.7, 1: 740.4. Samples: 4899751. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:08,543][23569] Avg episode reward: [(0, '28.610'), (1, '28.320')] [2023-09-22 09:41:13,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19726336. Throughput: 0: 744.9, 1: 747.1. Samples: 4908923. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:13,543][23569] Avg episode reward: [(0, '29.960'), (1, '28.000')] [2023-09-22 09:41:13,557][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000038352_9818112.pth... [2023-09-22 09:41:13,558][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000038696_9908224.pth... [2023-09-22 09:41:13,588][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000035632_9121792.pth [2023-09-22 09:41:13,595][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000035976_9211904.pth [2023-09-22 09:41:16,641][24647] Updated weights for policy 0, policy_version 38744 (0.0017) [2023-09-22 09:41:16,641][24648] Updated weights for policy 1, policy_version 38400 (0.0018) [2023-09-22 09:41:18,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5831.6). Total num frames: 19759104. Throughput: 0: 740.2, 1: 741.0. Samples: 4917248. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:18,543][23569] Avg episode reward: [(0, '29.840'), (1, '27.070')] [2023-09-22 09:41:23,542][23569] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 19791872. Throughput: 0: 738.9, 1: 738.1. Samples: 4921723. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:23,543][23569] Avg episode reward: [(0, '30.420'), (1, '27.290')] [2023-09-22 09:41:28,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19816448. Throughput: 0: 742.5, 1: 743.4. Samples: 4930865. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:28,542][23569] Avg episode reward: [(0, '30.650'), (1, '27.380')] [2023-09-22 09:41:30,487][24647] Updated weights for policy 0, policy_version 38904 (0.0016) [2023-09-22 09:41:30,487][24648] Updated weights for policy 1, policy_version 38560 (0.0017) [2023-09-22 09:41:33,542][23569] Fps is (10 sec: 5734.4, 60 sec: 6007.4, 300 sec: 5859.4). Total num frames: 19849216. Throughput: 0: 739.8, 1: 740.8. Samples: 4939714. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:33,543][23569] Avg episode reward: [(0, '30.980'), (1, '27.940')] [2023-09-22 09:41:38,542][23569] Fps is (10 sec: 5734.3, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19873792. Throughput: 0: 736.1, 1: 735.9. Samples: 4943873. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-22 09:41:38,543][23569] Avg episode reward: [(0, '30.950'), (1, '27.240')] [2023-09-22 09:41:43,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 19906560. Throughput: 0: 733.9, 1: 734.4. Samples: 4952490. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:41:43,543][23569] Avg episode reward: [(0, '30.800'), (1, '28.560')] [2023-09-22 09:41:44,524][24647] Updated weights for policy 0, policy_version 39064 (0.0016) [2023-09-22 09:41:44,525][24648] Updated weights for policy 1, policy_version 38720 (0.0018) [2023-09-22 09:41:48,542][23569] Fps is (10 sec: 6143.9, 60 sec: 5939.2, 300 sec: 5845.5). Total num frames: 19935232. Throughput: 0: 737.6, 1: 738.3. Samples: 4961557. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:41:48,544][23569] Avg episode reward: [(0, '30.230'), (1, '27.730')] [2023-09-22 09:41:53,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 19963904. Throughput: 0: 736.9, 1: 737.4. Samples: 4966094. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:41:53,543][23569] Avg episode reward: [(0, '30.170'), (1, '28.180')] [2023-09-22 09:41:58,436][24648] Updated weights for policy 1, policy_version 38880 (0.0020) [2023-09-22 09:41:58,436][24647] Updated weights for policy 0, policy_version 39224 (0.0018) [2023-09-22 09:41:58,542][23569] Fps is (10 sec: 6144.2, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 19996672. Throughput: 0: 730.3, 1: 729.0. Samples: 4974593. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:41:58,542][23569] Avg episode reward: [(0, '30.850'), (1, '28.040')] [2023-09-22 09:42:03,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 20021248. Throughput: 0: 732.8, 1: 731.7. Samples: 4983152. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-22 09:42:03,543][23569] Avg episode reward: [(0, '31.190'), (1, '27.800')] [2023-09-22 09:42:08,542][23569] Fps is (10 sec: 5734.2, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 20054016. Throughput: 0: 732.1, 1: 732.4. Samples: 4987622. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:42:08,544][23569] Avg episode reward: [(0, '31.570'), (1, '27.990')] [2023-09-22 09:42:08,545][24306] Saving new best policy, reward=31.570! [2023-09-22 09:42:12,721][24647] Updated weights for policy 0, policy_version 39384 (0.0016) [2023-09-22 09:42:12,721][24648] Updated weights for policy 1, policy_version 39040 (0.0016) [2023-09-22 09:42:13,542][23569] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 5831.6). Total num frames: 20078592. Throughput: 0: 727.4, 1: 726.8. Samples: 4996308. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-22 09:42:13,543][23569] Avg episode reward: [(0, '31.230'), (1, '27.900')] [2023-09-22 09:42:15,365][24653] Stopping RolloutWorker_w4... [2023-09-22 09:42:15,365][23569] Component RolloutWorker_w2 stopped! [2023-09-22 09:42:15,365][24651] Stopping RolloutWorker_w2... [2023-09-22 09:42:15,366][24652] Stopping RolloutWorker_w3... [2023-09-22 09:42:15,365][24649] Stopping RolloutWorker_w0... [2023-09-22 09:42:15,365][24650] Stopping RolloutWorker_w1... [2023-09-22 09:42:15,366][24495] Stopping Batcher_1... [2023-09-22 09:42:15,365][24656] Stopping RolloutWorker_w7... [2023-09-22 09:42:15,365][24655] Stopping RolloutWorker_w6... [2023-09-22 09:42:15,365][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000039416_10092544.pth... [2023-09-22 09:42:15,366][24652] Loop rollout_proc3_evt_loop terminating... [2023-09-22 09:42:15,366][24654] Stopping RolloutWorker_w5... [2023-09-22 09:42:15,366][24495] Loop batcher_evt_loop terminating... [2023-09-22 09:42:15,366][24649] Loop rollout_proc0_evt_loop terminating... [2023-09-22 09:42:15,366][24653] Loop rollout_proc4_evt_loop terminating... [2023-09-22 09:42:15,366][23569] Component RolloutWorker_w1 stopped! [2023-09-22 09:42:15,366][24656] Loop rollout_proc7_evt_loop terminating... [2023-09-22 09:42:15,366][24651] Loop rollout_proc2_evt_loop terminating... [2023-09-22 09:42:15,366][24650] Loop rollout_proc1_evt_loop terminating... [2023-09-22 09:42:15,366][23569] Component RolloutWorker_w4 stopped! [2023-09-22 09:42:15,366][24655] Loop rollout_proc6_evt_loop terminating... [2023-09-22 09:42:15,366][24654] Loop rollout_proc5_evt_loop terminating... [2023-09-22 09:42:15,367][23569] Component RolloutWorker_w6 stopped! [2023-09-22 09:42:15,367][23569] Component RolloutWorker_w0 stopped! [2023-09-22 09:42:15,367][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000039072_10002432.pth... [2023-09-22 09:42:15,367][23569] Component Batcher_1 stopped! [2023-09-22 09:42:15,368][23569] Component RolloutWorker_w7 stopped! [2023-09-22 09:42:15,368][23569] Component RolloutWorker_w3 stopped! [2023-09-22 09:42:15,368][23569] Component RolloutWorker_w5 stopped! [2023-09-22 09:42:15,369][23569] Component Batcher_0 stopped! [2023-09-22 09:42:15,366][24306] Stopping Batcher_0... [2023-09-22 09:42:15,377][24306] Loop batcher_evt_loop terminating... [2023-09-22 09:42:15,390][24648] Weights refcount: 2 0 [2023-09-22 09:42:15,393][24648] Stopping InferenceWorker_p1-w0... [2023-09-22 09:42:15,394][24648] Loop inference_proc1-0_evt_loop terminating... [2023-09-22 09:42:15,394][23569] Component InferenceWorker_p1-w0 stopped! [2023-09-22 09:42:15,405][24495] Removing ./train_atari/Alien/checkpoint_p1/checkpoint_000036976_9465856.pth [2023-09-22 09:42:15,409][24495] Saving ./train_atari/Alien/checkpoint_p1/checkpoint_000039072_10002432.pth... [2023-09-22 09:42:15,412][24306] Removing ./train_atari/Alien/checkpoint_p0/checkpoint_000037320_9555968.pth [2023-09-22 09:42:15,418][24306] Saving ./train_atari/Alien/checkpoint_p0/checkpoint_000039416_10092544.pth... [2023-09-22 09:42:15,424][24647] Weights refcount: 2 0 [2023-09-22 09:42:15,425][24647] Stopping InferenceWorker_p0-w0... [2023-09-22 09:42:15,426][24647] Loop inference_proc0-0_evt_loop terminating... [2023-09-22 09:42:15,426][23569] Component InferenceWorker_p0-w0 stopped! [2023-09-22 09:42:15,444][24495] Stopping LearnerWorker_p1... [2023-09-22 09:42:15,445][24495] Loop learner_proc1_evt_loop terminating... [2023-09-22 09:42:15,446][23569] Component LearnerWorker_p1 stopped! [2023-09-22 09:42:15,476][24306] Stopping LearnerWorker_p0... [2023-09-22 09:42:15,477][24306] Loop learner_proc0_evt_loop terminating... [2023-09-22 09:42:15,476][23569] Component LearnerWorker_p0 stopped! [2023-09-22 09:42:15,477][23569] Waiting for process learner_proc0 to stop... [2023-09-22 09:42:16,141][23569] Waiting for process learner_proc1 to stop... [2023-09-22 09:42:16,142][23569] Waiting for process inference_proc0-0 to join... [2023-09-22 09:42:16,143][23569] Waiting for process inference_proc1-0 to join... [2023-09-22 09:42:16,143][23569] Waiting for process rollout_proc0 to join... [2023-09-22 09:42:16,144][23569] Waiting for process rollout_proc1 to join... [2023-09-22 09:42:16,145][23569] Waiting for process rollout_proc2 to join... [2023-09-22 09:42:16,145][23569] Waiting for process rollout_proc3 to join... [2023-09-22 09:42:16,146][23569] Waiting for process rollout_proc4 to join... [2023-09-22 09:42:16,146][23569] Waiting for process rollout_proc5 to join... [2023-09-22 09:42:16,147][23569] Waiting for process rollout_proc6 to join... [2023-09-22 09:42:16,147][23569] Waiting for process rollout_proc7 to join... [2023-09-22 09:42:16,148][23569] Batcher 0 profile tree view: batching: 21.5074, releasing_batches: 1.9182 [2023-09-22 09:42:16,148][23569] Batcher 1 profile tree view: batching: 21.5164, releasing_batches: 1.7715 [2023-09-22 09:42:16,149][23569] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 736.2776 update_model: 38.6498 weight_update: 0.0016 one_step: 0.0011 handle_policy_step: 2420.1080 deserialize: 71.4038, stack: 16.9443, obs_to_device_normalize: 582.2603, forward: 1176.6841, send_messages: 95.9360 prepare_outputs: 323.0556 to_cpu: 163.2684 [2023-09-22 09:42:16,149][23569] InferenceWorker_p1-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 756.9346 update_model: 38.3347 weight_update: 0.0018 one_step: 0.0031 handle_policy_step: 2402.8154 deserialize: 70.4933, stack: 17.1500, obs_to_device_normalize: 580.9364, forward: 1161.8300, send_messages: 98.1208 prepare_outputs: 323.4622 to_cpu: 162.8099 [2023-09-22 09:42:16,150][23569] Learner 0 profile tree view: misc: 0.0157, prepare_batch: 32.2497 train: 470.0413 epoch_init: 0.1081, minibatch_init: 3.4140, losses_postprocess: 59.0580, kl_divergence: 5.8504, after_optimizer: 10.9285 calculate_losses: 48.7565 losses_init: 0.1222, forward_head: 15.4865, bptt_initial: 0.4617, bptt: 0.5046, tail: 11.1559, advantages_returns: 3.3404, losses: 13.8130 update: 337.5762 clip: 166.0634 [2023-09-22 09:42:16,150][23569] Learner 1 profile tree view: misc: 0.0155, prepare_batch: 32.2798 train: 457.9783 epoch_init: 0.1103, minibatch_init: 3.4006, losses_postprocess: 60.0260, kl_divergence: 5.6940, after_optimizer: 20.7317 calculate_losses: 48.0630 losses_init: 0.1100, forward_head: 15.3393, bptt_initial: 0.4498, bptt: 0.4820, tail: 11.0357, advantages_returns: 3.2280, losses: 13.5973 update: 315.5947 clip: 165.0670 [2023-09-22 09:42:16,151][23569] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4005, enqueue_policy_requests: 44.6030, env_step: 1363.8596, overhead: 30.5022, complete_rollouts: 1.0880 save_policy_outputs: 57.6201 split_output_tensors: 20.0241 [2023-09-22 09:42:16,151][23569] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3940, enqueue_policy_requests: 43.8740, env_step: 1309.2058, overhead: 30.0510, complete_rollouts: 1.0683 save_policy_outputs: 57.3626 split_output_tensors: 19.4889 [2023-09-22 09:42:16,152][23569] Loop Runner_EvtLoop terminating... [2023-09-22 09:42:16,152][23569] Runner profile tree view: main_loop: 3418.3920 [2023-09-22 09:42:16,153][23569] Collected {0: 10092544, 1: 10002432}, FPS: 5852.1