MattStammers's picture
Upload folder using huggingface_hub
d1275f0
[2023-09-21 13:23:17,827][52220] Saving configuration to ./train_dir/DoublePendulum/config.json...
[2023-09-21 13:23:17,892][52220] Rollout worker 0 uses device cpu
[2023-09-21 13:23:17,893][52220] Rollout worker 1 uses device cpu
[2023-09-21 13:23:17,894][52220] Rollout worker 2 uses device cpu
[2023-09-21 13:23:17,894][52220] Rollout worker 3 uses device cpu
[2023-09-21 13:23:17,895][52220] Rollout worker 4 uses device cpu
[2023-09-21 13:23:17,896][52220] Rollout worker 5 uses device cpu
[2023-09-21 13:23:17,896][52220] Rollout worker 6 uses device cpu
[2023-09-21 13:23:17,897][52220] Rollout worker 7 uses device cpu
[2023-09-21 13:23:17,897][52220] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1
[2023-09-21 13:23:17,949][52220] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 13:23:17,950][52220] InferenceWorker_p0-w0: min num requests: 1
[2023-09-21 13:23:17,953][52220] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-21 13:23:17,953][52220] InferenceWorker_p1-w0: min num requests: 1
[2023-09-21 13:23:17,977][52220] Starting all processes...
[2023-09-21 13:23:17,977][52220] Starting process learner_proc0
[2023-09-21 13:23:17,980][52220] Starting process learner_proc1
[2023-09-21 13:23:18,027][52220] Starting all processes...
[2023-09-21 13:23:18,033][52220] Starting process inference_proc0-0
[2023-09-21 13:23:18,034][52220] Starting process inference_proc1-0
[2023-09-21 13:23:18,034][52220] Starting process rollout_proc0
[2023-09-21 13:23:18,034][52220] Starting process rollout_proc1
[2023-09-21 13:23:18,035][52220] Starting process rollout_proc2
[2023-09-21 13:23:18,035][52220] Starting process rollout_proc3
[2023-09-21 13:23:18,039][52220] Starting process rollout_proc4
[2023-09-21 13:23:18,039][52220] Starting process rollout_proc5
[2023-09-21 13:23:18,040][52220] Starting process rollout_proc6
[2023-09-21 13:23:18,043][52220] Starting process rollout_proc7
[2023-09-21 13:23:19,835][52884] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 13:23:19,835][52884] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-09-21 13:23:19,840][52885] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-21 13:23:19,840][52885] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1
[2023-09-21 13:23:19,853][52884] Num visible devices: 1
[2023-09-21 13:23:19,858][52885] Num visible devices: 1
[2023-09-21 13:23:19,875][52884] Starting seed is not provided
[2023-09-21 13:23:19,875][52884] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 13:23:19,875][52884] Initializing actor-critic model on device cuda:0
[2023-09-21 13:23:19,876][52884] RunningMeanStd input shape: (11,)
[2023-09-21 13:23:19,876][52884] RunningMeanStd input shape: (1,)
[2023-09-21 13:23:19,896][52885] Starting seed is not provided
[2023-09-21 13:23:19,896][52885] Using GPUs [0] for process 1 (actually maps to GPUs [1])
[2023-09-21 13:23:19,896][52885] Initializing actor-critic model on device cuda:0
[2023-09-21 13:23:19,897][52885] RunningMeanStd input shape: (11,)
[2023-09-21 13:23:19,897][52885] RunningMeanStd input shape: (1,)
[2023-09-21 13:23:19,912][52984] Worker 1 uses CPU cores [4, 5, 6, 7]
[2023-09-21 13:23:19,917][52979] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-21 13:23:19,917][52979] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1
[2023-09-21 13:23:19,925][52884] Created Actor Critic model with architecture:
[2023-09-21 13:23:19,925][52884] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): MultiInputEncoder(
(encoders): ModuleDict(
(obs): MlpEncoder(
(mlp_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=Tanh)
(2): RecursiveScriptModule(original_name=Linear)
(3): RecursiveScriptModule(original_name=Tanh)
)
)
)
)
(core): ModelCoreIdentity()
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=64, out_features=1, bias=True)
(action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev(
(distribution_linear): Linear(in_features=64, out_features=1, bias=True)
)
)
[2023-09-21 13:23:19,932][52986] Worker 3 uses CPU cores [12, 13, 14, 15]
[2023-09-21 13:23:19,962][52979] Num visible devices: 1
[2023-09-21 13:23:19,984][52990] Worker 7 uses CPU cores [28, 29, 30, 31]
[2023-09-21 13:23:19,986][52985] Worker 2 uses CPU cores [8, 9, 10, 11]
[2023-09-21 13:23:19,986][52885] Created Actor Critic model with architecture:
[2023-09-21 13:23:19,986][52885] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): MultiInputEncoder(
(encoders): ModuleDict(
(obs): MlpEncoder(
(mlp_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=Tanh)
(2): RecursiveScriptModule(original_name=Linear)
(3): RecursiveScriptModule(original_name=Tanh)
)
)
)
)
(core): ModelCoreIdentity()
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=64, out_features=1, bias=True)
(action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev(
(distribution_linear): Linear(in_features=64, out_features=1, bias=True)
)
)
[2023-09-21 13:23:20,065][52982] Worker 0 uses CPU cores [0, 1, 2, 3]
[2023-09-21 13:23:20,079][52980] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 13:23:20,079][52980] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-09-21 13:23:20,093][52988] Worker 5 uses CPU cores [20, 21, 22, 23]
[2023-09-21 13:23:20,097][52980] Num visible devices: 1
[2023-09-21 13:23:20,121][52987] Worker 4 uses CPU cores [16, 17, 18, 19]
[2023-09-21 13:23:20,153][52989] Worker 6 uses CPU cores [24, 25, 26, 27]
[2023-09-21 13:23:20,537][52884] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-09-21 13:23:20,537][52884] No checkpoints found
[2023-09-21 13:23:20,538][52884] Did not load from checkpoint, starting from scratch!
[2023-09-21 13:23:20,538][52884] Initialized policy 0 weights for model version 0
[2023-09-21 13:23:20,539][52884] LearnerWorker_p0 finished initialization!
[2023-09-21 13:23:20,540][52884] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-21 13:23:20,583][52885] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-09-21 13:23:20,584][52885] No checkpoints found
[2023-09-21 13:23:20,584][52885] Did not load from checkpoint, starting from scratch!
[2023-09-21 13:23:20,584][52885] Initialized policy 1 weights for model version 0
[2023-09-21 13:23:20,586][52885] LearnerWorker_p1 finished initialization!
[2023-09-21 13:23:20,586][52885] Using GPUs [0] for process 1 (actually maps to GPUs [1])
[2023-09-21 13:23:21,118][52980] RunningMeanStd input shape: (11,)
[2023-09-21 13:23:21,118][52980] RunningMeanStd input shape: (1,)
[2023-09-21 13:23:21,129][52979] RunningMeanStd input shape: (11,)
[2023-09-21 13:23:21,130][52979] RunningMeanStd input shape: (1,)
[2023-09-21 13:23:21,151][52220] Inference worker 0-0 is ready!
[2023-09-21 13:23:21,163][52220] Inference worker 1-0 is ready!
[2023-09-21 13:23:21,164][52220] All inference workers are ready! Signal rollout workers to start!
[2023-09-21 13:23:21,247][52986] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,248][52986] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,248][52990] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,249][52990] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,252][52985] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,253][52985] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,256][52989] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,257][52982] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,257][52989] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,257][52982] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,260][52987] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,260][52987] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,262][52986] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,262][52990] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,266][52985] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,270][52989] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,271][52982] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,273][52987] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,287][52986] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,288][52990] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,290][52985] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,295][52989] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,296][52982] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,295][52984] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,295][52988] Decorrelating experience for 0 frames...
[2023-09-21 13:23:21,296][52984] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,296][52988] Decorrelating experience for 64 frames...
[2023-09-21 13:23:21,299][52987] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,320][52984] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,320][52988] Decorrelating experience for 128 frames...
[2023-09-21 13:23:21,332][52990] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,333][52986] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,334][52985] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,339][52989] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,342][52982] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,344][52987] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,366][52984] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,367][52988] Decorrelating experience for 192 frames...
[2023-09-21 13:23:21,381][52990] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,382][52985] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,382][52986] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,388][52989] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,390][52982] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,394][52987] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,433][52988] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,438][52984] Decorrelating experience for 256 frames...
[2023-09-21 13:23:21,441][52990] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,441][52985] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,444][52986] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,449][52989] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,451][52982] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,457][52987] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,483][52988] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,486][52984] Decorrelating experience for 320 frames...
[2023-09-21 13:23:21,513][52985] Decorrelating experience for 448 frames...
[2023-09-21 13:23:21,514][52990] Decorrelating experience for 448 frames...
[2023-09-21 13:23:21,519][52986] Decorrelating experience for 448 frames...
[2023-09-21 13:23:21,522][52989] Decorrelating experience for 448 frames...
[2023-09-21 13:23:21,526][52982] Decorrelating experience for 448 frames...
[2023-09-21 13:23:21,532][52987] Decorrelating experience for 448 frames...
[2023-09-21 13:23:21,544][52988] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,547][52984] Decorrelating experience for 384 frames...
[2023-09-21 13:23:21,618][52988] Decorrelating experience for 448 frames...
[2023-09-21 13:23:21,621][52984] Decorrelating experience for 448 frames...
[2023-09-21 13:23:24,286][52220] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8192. Throughput: 0: nan, 1: nan. Samples: 12962. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:23:24,287][52220] Avg episode reward: [(0, '58.179'), (1, '48.837')]
[2023-09-21 13:23:29,286][52220] Fps is (10 sec: 9830.6, 60 sec: 9830.6, 300 sec: 9830.6). Total num frames: 57344. Throughput: 0: 2432.8, 1: 2433.2. Samples: 37292. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:23:29,287][52220] Avg episode reward: [(0, '112.394'), (1, '100.528')]
[2023-09-21 13:23:29,291][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000056_28672.pth...
[2023-09-21 13:23:29,291][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000056_28672.pth...
[2023-09-21 13:23:30,770][52980] Updated weights for policy 0, policy_version 80 (0.0015)
[2023-09-21 13:23:30,770][52979] Updated weights for policy 1, policy_version 80 (0.0015)
[2023-09-21 13:23:34,287][52220] Fps is (10 sec: 11468.5, 60 sec: 11468.5, 300 sec: 11468.5). Total num frames: 122880. Throughput: 0: 5028.3, 1: 5027.7. Samples: 113524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:23:34,288][52220] Avg episode reward: [(0, '177.707'), (1, '178.329')]
[2023-09-21 13:23:37,163][52980] Updated weights for policy 0, policy_version 160 (0.0012)
[2023-09-21 13:23:37,164][52979] Updated weights for policy 1, policy_version 160 (0.0014)
[2023-09-21 13:23:37,937][52220] Heartbeat connected on Batcher_0
[2023-09-21 13:23:37,940][52220] Heartbeat connected on LearnerWorker_p0
[2023-09-21 13:23:37,943][52220] Heartbeat connected on Batcher_1
[2023-09-21 13:23:37,946][52220] Heartbeat connected on LearnerWorker_p1
[2023-09-21 13:23:37,952][52220] Heartbeat connected on InferenceWorker_p0-w0
[2023-09-21 13:23:37,956][52220] Heartbeat connected on RolloutWorker_w0
[2023-09-21 13:23:37,957][52220] Heartbeat connected on InferenceWorker_p1-w0
[2023-09-21 13:23:37,961][52220] Heartbeat connected on RolloutWorker_w1
[2023-09-21 13:23:37,963][52220] Heartbeat connected on RolloutWorker_w2
[2023-09-21 13:23:37,966][52220] Heartbeat connected on RolloutWorker_w3
[2023-09-21 13:23:37,968][52220] Heartbeat connected on RolloutWorker_w4
[2023-09-21 13:23:37,970][52220] Heartbeat connected on RolloutWorker_w5
[2023-09-21 13:23:37,973][52220] Heartbeat connected on RolloutWorker_w6
[2023-09-21 13:23:37,975][52220] Heartbeat connected on RolloutWorker_w7
[2023-09-21 13:23:39,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12014.7, 300 sec: 12014.7). Total num frames: 188416. Throughput: 0: 5892.0, 1: 5893.5. Samples: 189748. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:23:39,288][52220] Avg episode reward: [(0, '224.041'), (1, '220.758')]
[2023-09-21 13:23:43,737][52980] Updated weights for policy 0, policy_version 240 (0.0012)
[2023-09-21 13:23:43,738][52979] Updated weights for policy 1, policy_version 240 (0.0015)
[2023-09-21 13:23:44,287][52220] Fps is (10 sec: 12288.1, 60 sec: 11878.3, 300 sec: 11878.3). Total num frames: 245760. Throughput: 0: 5328.7, 1: 5329.1. Samples: 226122. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:23:44,287][52220] Avg episode reward: [(0, '297.844'), (1, '299.326')]
[2023-09-21 13:23:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000240_122880.pth...
[2023-09-21 13:23:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000240_122880.pth...
[2023-09-21 13:23:44,303][52885] Saving new best policy, reward=299.326!
[2023-09-21 13:23:44,304][52884] Saving new best policy, reward=297.844!
[2023-09-21 13:23:49,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12124.1, 300 sec: 12124.1). Total num frames: 311296. Throughput: 0: 5801.9, 1: 5799.6. Samples: 303000. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:23:49,287][52220] Avg episode reward: [(0, '333.954'), (1, '360.469')]
[2023-09-21 13:23:49,289][52885] Saving new best policy, reward=360.469!
[2023-09-21 13:23:49,289][52884] Saving new best policy, reward=333.954!
[2023-09-21 13:23:50,207][52980] Updated weights for policy 0, policy_version 320 (0.0014)
[2023-09-21 13:23:50,207][52979] Updated weights for policy 1, policy_version 320 (0.0014)
[2023-09-21 13:23:54,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12287.9, 300 sec: 12287.9). Total num frames: 376832. Throughput: 0: 6077.8, 1: 6078.5. Samples: 377654. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:23:54,288][52220] Avg episode reward: [(0, '405.556'), (1, '490.471')]
[2023-09-21 13:23:54,289][52885] Saving new best policy, reward=490.471!
[2023-09-21 13:23:54,289][52884] Saving new best policy, reward=405.556!
[2023-09-21 13:23:56,804][52980] Updated weights for policy 0, policy_version 400 (0.0015)
[2023-09-21 13:23:56,804][52979] Updated weights for policy 1, policy_version 400 (0.0013)
[2023-09-21 13:23:59,287][52220] Fps is (10 sec: 12287.5, 60 sec: 12170.8, 300 sec: 12170.8). Total num frames: 434176. Throughput: 0: 5736.0, 1: 5736.4. Samples: 414500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:23:59,288][52220] Avg episode reward: [(0, '530.836'), (1, '625.912')]
[2023-09-21 13:23:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000424_217088.pth...
[2023-09-21 13:23:59,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000424_217088.pth...
[2023-09-21 13:23:59,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000056_28672.pth
[2023-09-21 13:23:59,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000056_28672.pth
[2023-09-21 13:23:59,305][52885] Saving new best policy, reward=625.912!
[2023-09-21 13:23:59,305][52884] Saving new best policy, reward=530.836!
[2023-09-21 13:24:03,331][52980] Updated weights for policy 0, policy_version 480 (0.0013)
[2023-09-21 13:24:03,331][52979] Updated weights for policy 1, policy_version 480 (0.0013)
[2023-09-21 13:24:04,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12288.0, 300 sec: 12288.0). Total num frames: 499712. Throughput: 0: 5977.2, 1: 5976.6. Samples: 491118. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:24:04,287][52220] Avg episode reward: [(0, '811.461'), (1, '833.450')]
[2023-09-21 13:24:04,288][52884] Saving new best policy, reward=811.461!
[2023-09-21 13:24:04,288][52885] Saving new best policy, reward=833.450!
[2023-09-21 13:24:09,286][52220] Fps is (10 sec: 13107.8, 60 sec: 12379.0, 300 sec: 12379.0). Total num frames: 565248. Throughput: 0: 6138.2, 1: 6136.3. Samples: 565318. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:24:09,287][52220] Avg episode reward: [(0, '1132.216'), (1, '1290.527')]
[2023-09-21 13:24:09,288][52884] Saving new best policy, reward=1132.216!
[2023-09-21 13:24:09,288][52885] Saving new best policy, reward=1290.527!
[2023-09-21 13:24:09,772][52980] Updated weights for policy 0, policy_version 560 (0.0014)
[2023-09-21 13:24:09,772][52979] Updated weights for policy 1, policy_version 560 (0.0014)
[2023-09-21 13:24:14,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12451.8, 300 sec: 12451.8). Total num frames: 630784. Throughput: 0: 6322.2, 1: 6320.4. Samples: 606212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:24:14,287][52220] Avg episode reward: [(0, '1820.030'), (1, '2335.145')]
[2023-09-21 13:24:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000616_315392.pth...
[2023-09-21 13:24:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000616_315392.pth...
[2023-09-21 13:24:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000240_122880.pth
[2023-09-21 13:24:14,302][52885] Saving new best policy, reward=2335.145!
[2023-09-21 13:24:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000240_122880.pth
[2023-09-21 13:24:14,305][52884] Saving new best policy, reward=1820.030!
[2023-09-21 13:24:15,764][52979] Updated weights for policy 1, policy_version 640 (0.0014)
[2023-09-21 13:24:15,765][52980] Updated weights for policy 0, policy_version 640 (0.0013)
[2023-09-21 13:24:19,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12511.4, 300 sec: 12511.4). Total num frames: 696320. Throughput: 0: 6386.1, 1: 6384.4. Samples: 688194. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:24:19,288][52220] Avg episode reward: [(0, '3468.930'), (1, '4412.869')]
[2023-09-21 13:24:19,289][52884] Saving new best policy, reward=3468.930!
[2023-09-21 13:24:19,289][52885] Saving new best policy, reward=4412.869!
[2023-09-21 13:24:21,997][52979] Updated weights for policy 1, policy_version 720 (0.0013)
[2023-09-21 13:24:21,998][52980] Updated weights for policy 0, policy_version 720 (0.0016)
[2023-09-21 13:24:24,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12561.0, 300 sec: 12561.0). Total num frames: 761856. Throughput: 0: 6405.2, 1: 6405.3. Samples: 766218. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:24:24,288][52220] Avg episode reward: [(0, '5840.580'), (1, '5940.595')]
[2023-09-21 13:24:24,289][52885] Saving new best policy, reward=5940.595!
[2023-09-21 13:24:24,289][52884] Saving new best policy, reward=5840.580!
[2023-09-21 13:24:28,294][52980] Updated weights for policy 0, policy_version 800 (0.0012)
[2023-09-21 13:24:28,294][52979] Updated weights for policy 1, policy_version 800 (0.0013)
[2023-09-21 13:24:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.0, 300 sec: 12603.0). Total num frames: 827392. Throughput: 0: 6422.8, 1: 6422.5. Samples: 804164. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:24:29,288][52220] Avg episode reward: [(0, '7623.753'), (1, '6997.287')]
[2023-09-21 13:24:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000808_413696.pth...
[2023-09-21 13:24:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000808_413696.pth...
[2023-09-21 13:24:29,298][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000424_217088.pth
[2023-09-21 13:24:29,298][52884] Saving new best policy, reward=7623.753!
[2023-09-21 13:24:29,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000424_217088.pth
[2023-09-21 13:24:29,303][52885] Saving new best policy, reward=6997.287!
[2023-09-21 13:24:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12639.1). Total num frames: 892928. Throughput: 0: 6415.3, 1: 6417.3. Samples: 880470. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:24:34,287][52220] Avg episode reward: [(0, '8302.445'), (1, '7779.651')]
[2023-09-21 13:24:34,288][52884] Saving new best policy, reward=8302.445!
[2023-09-21 13:24:34,288][52885] Saving new best policy, reward=7779.651!
[2023-09-21 13:24:34,804][52980] Updated weights for policy 0, policy_version 880 (0.0013)
[2023-09-21 13:24:34,806][52979] Updated weights for policy 1, policy_version 880 (0.0016)
[2023-09-21 13:24:39,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12561.0). Total num frames: 950272. Throughput: 0: 6449.9, 1: 6449.0. Samples: 958104. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:24:39,288][52220] Avg episode reward: [(0, '8446.668'), (1, '8495.056')]
[2023-09-21 13:24:39,295][52885] Saving new best policy, reward=8495.056!
[2023-09-21 13:24:39,302][52884] Saving new best policy, reward=8446.668!
[2023-09-21 13:24:41,205][52980] Updated weights for policy 0, policy_version 960 (0.0014)
[2023-09-21 13:24:41,205][52979] Updated weights for policy 1, policy_version 960 (0.0015)
[2023-09-21 13:24:44,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.2, 300 sec: 12595.2). Total num frames: 1015808. Throughput: 0: 6460.1, 1: 6460.2. Samples: 995908. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:24:44,287][52220] Avg episode reward: [(0, '8409.842'), (1, '8561.365')]
[2023-09-21 13:24:44,291][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000992_507904.pth...
[2023-09-21 13:24:44,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000992_507904.pth...
[2023-09-21 13:24:44,296][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000616_315392.pth
[2023-09-21 13:24:44,296][52885] Saving new best policy, reward=8561.365!
[2023-09-21 13:24:44,297][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000616_315392.pth
[2023-09-21 13:24:47,855][52979] Updated weights for policy 1, policy_version 1040 (0.0009)
[2023-09-21 13:24:47,856][52980] Updated weights for policy 0, policy_version 1040 (0.0015)
[2023-09-21 13:24:49,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12625.3). Total num frames: 1081344. Throughput: 0: 6419.3, 1: 6419.3. Samples: 1068854. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:24:49,287][52220] Avg episode reward: [(0, '8587.504'), (1, '8914.431')]
[2023-09-21 13:24:49,288][52885] Saving new best policy, reward=8914.431!
[2023-09-21 13:24:49,288][52884] Saving new best policy, reward=8587.504!
[2023-09-21 13:24:54,166][52979] Updated weights for policy 1, policy_version 1120 (0.0011)
[2023-09-21 13:24:54,166][52980] Updated weights for policy 0, policy_version 1120 (0.0015)
[2023-09-21 13:24:54,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.2, 300 sec: 12652.1). Total num frames: 1146880. Throughput: 0: 6462.6, 1: 6462.5. Samples: 1146948. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:24:54,288][52220] Avg episode reward: [(0, '8752.820'), (1, '8903.087')]
[2023-09-21 13:24:54,289][52884] Saving new best policy, reward=8752.820!
[2023-09-21 13:24:59,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.7, 300 sec: 12676.0). Total num frames: 1212416. Throughput: 0: 6445.4, 1: 6448.0. Samples: 1186416. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:24:59,287][52220] Avg episode reward: [(0, '8885.027'), (1, '8883.166')]
[2023-09-21 13:24:59,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001184_606208.pth...
[2023-09-21 13:24:59,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001184_606208.pth...
[2023-09-21 13:24:59,296][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000808_413696.pth
[2023-09-21 13:24:59,296][52884] Saving new best policy, reward=8885.027!
[2023-09-21 13:24:59,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000808_413696.pth
[2023-09-21 13:25:00,488][52979] Updated weights for policy 1, policy_version 1200 (0.0013)
[2023-09-21 13:25:00,488][52980] Updated weights for policy 0, policy_version 1200 (0.0011)
[2023-09-21 13:25:04,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12697.6). Total num frames: 1277952. Throughput: 0: 6396.5, 1: 6398.3. Samples: 1263958. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:25:04,287][52220] Avg episode reward: [(0, '8887.110'), (1, '8976.377')]
[2023-09-21 13:25:04,288][52885] Saving new best policy, reward=8976.377!
[2023-09-21 13:25:04,288][52884] Saving new best policy, reward=8887.110!
[2023-09-21 13:25:06,706][52980] Updated weights for policy 0, policy_version 1280 (0.0014)
[2023-09-21 13:25:06,706][52979] Updated weights for policy 1, policy_version 1280 (0.0009)
[2023-09-21 13:25:09,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 12717.1). Total num frames: 1343488. Throughput: 0: 6416.0, 1: 6413.6. Samples: 1343552. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:25:09,287][52220] Avg episode reward: [(0, '9073.447'), (1, '9070.914')]
[2023-09-21 13:25:09,289][52884] Saving new best policy, reward=9073.447!
[2023-09-21 13:25:09,289][52885] Saving new best policy, reward=9070.914!
[2023-09-21 13:25:12,903][52980] Updated weights for policy 0, policy_version 1360 (0.0012)
[2023-09-21 13:25:12,903][52979] Updated weights for policy 1, policy_version 1360 (0.0015)
[2023-09-21 13:25:14,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12734.8). Total num frames: 1409024. Throughput: 0: 6430.3, 1: 6430.1. Samples: 1382876. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:25:14,287][52220] Avg episode reward: [(0, '9257.818'), (1, '9254.347')]
[2023-09-21 13:25:14,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001376_704512.pth...
[2023-09-21 13:25:14,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001376_704512.pth...
[2023-09-21 13:25:14,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000000992_507904.pth
[2023-09-21 13:25:14,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000000992_507904.pth
[2023-09-21 13:25:14,300][52884] Saving new best policy, reward=9257.818!
[2023-09-21 13:25:14,300][52885] Saving new best policy, reward=9254.347!
[2023-09-21 13:25:19,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12679.8). Total num frames: 1466368. Throughput: 0: 6424.4, 1: 6423.9. Samples: 1458646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:25:19,287][52220] Avg episode reward: [(0, '9349.989'), (1, '9256.270')]
[2023-09-21 13:25:19,289][52884] Saving new best policy, reward=9349.989!
[2023-09-21 13:25:19,289][52885] Saving new best policy, reward=9256.270!
[2023-09-21 13:25:19,413][52980] Updated weights for policy 0, policy_version 1440 (0.0010)
[2023-09-21 13:25:19,414][52979] Updated weights for policy 1, policy_version 1440 (0.0016)
[2023-09-21 13:25:24,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12765.9). Total num frames: 1540096. Throughput: 0: 6464.2, 1: 6463.7. Samples: 1539856. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:25:24,287][52220] Avg episode reward: [(0, '9351.115'), (1, '9350.083')]
[2023-09-21 13:25:24,288][52885] Saving new best policy, reward=9350.083!
[2023-09-21 13:25:24,288][52884] Saving new best policy, reward=9351.115!
[2023-09-21 13:25:25,490][52979] Updated weights for policy 1, policy_version 1520 (0.0012)
[2023-09-21 13:25:25,492][52980] Updated weights for policy 0, policy_version 1520 (0.0014)
[2023-09-21 13:25:29,287][52220] Fps is (10 sec: 13926.2, 60 sec: 12970.7, 300 sec: 12779.5). Total num frames: 1605632. Throughput: 0: 6502.8, 1: 6501.2. Samples: 1581092. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:25:29,288][52220] Avg episode reward: [(0, '8733.420'), (1, '9351.004')]
[2023-09-21 13:25:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001568_802816.pth...
[2023-09-21 13:25:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001568_802816.pth...
[2023-09-21 13:25:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001184_606208.pth
[2023-09-21 13:25:29,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001184_606208.pth
[2023-09-21 13:25:29,304][52885] Saving new best policy, reward=9351.004!
[2023-09-21 13:25:31,687][52979] Updated weights for policy 1, policy_version 1600 (0.0016)
[2023-09-21 13:25:31,687][52980] Updated weights for policy 0, policy_version 1600 (0.0018)
[2023-09-21 13:25:34,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12792.1). Total num frames: 1671168. Throughput: 0: 6560.7, 1: 6561.7. Samples: 1659366. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:25:34,287][52220] Avg episode reward: [(0, '8198.340'), (1, '9351.573')]
[2023-09-21 13:25:34,289][52885] Saving new best policy, reward=9351.573!
[2023-09-21 13:25:37,878][52980] Updated weights for policy 0, policy_version 1680 (0.0015)
[2023-09-21 13:25:37,878][52979] Updated weights for policy 1, policy_version 1680 (0.0014)
[2023-09-21 13:25:39,286][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.3, 300 sec: 12803.8). Total num frames: 1736704. Throughput: 0: 6552.9, 1: 6552.9. Samples: 1736706. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:25:39,287][52220] Avg episode reward: [(0, '7933.386'), (1, '9259.461')]
[2023-09-21 13:25:44,260][52979] Updated weights for policy 1, policy_version 1760 (0.0013)
[2023-09-21 13:25:44,260][52980] Updated weights for policy 0, policy_version 1760 (0.0013)
[2023-09-21 13:25:44,287][52220] Fps is (10 sec: 13106.7, 60 sec: 13107.1, 300 sec: 12814.6). Total num frames: 1802240. Throughput: 0: 6544.5, 1: 6544.1. Samples: 1775406. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:25:44,288][52220] Avg episode reward: [(0, '8092.241'), (1, '9260.352')]
[2023-09-21 13:25:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001760_901120.pth...
[2023-09-21 13:25:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001760_901120.pth...
[2023-09-21 13:25:44,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001376_704512.pth
[2023-09-21 13:25:44,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001376_704512.pth
[2023-09-21 13:25:49,286][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12824.7). Total num frames: 1867776. Throughput: 0: 6560.9, 1: 6561.2. Samples: 1854456. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:25:49,287][52220] Avg episode reward: [(0, '8264.465'), (1, '9261.111')]
[2023-09-21 13:25:50,550][52980] Updated weights for policy 0, policy_version 1840 (0.0014)
[2023-09-21 13:25:50,550][52979] Updated weights for policy 1, policy_version 1840 (0.0014)
[2023-09-21 13:25:54,287][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 12834.1). Total num frames: 1933312. Throughput: 0: 6553.7, 1: 6553.7. Samples: 1933384. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:25:54,287][52220] Avg episode reward: [(0, '7817.697'), (1, '9354.608')]
[2023-09-21 13:25:54,289][52885] Saving new best policy, reward=9354.608!
[2023-09-21 13:25:56,795][52979] Updated weights for policy 1, policy_version 1920 (0.0012)
[2023-09-21 13:25:56,795][52980] Updated weights for policy 0, policy_version 1920 (0.0014)
[2023-09-21 13:25:59,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12790.1). Total num frames: 1990656. Throughput: 0: 6544.5, 1: 6545.2. Samples: 1971916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:25:59,287][52220] Avg episode reward: [(0, '6932.151'), (1, '9354.989')]
[2023-09-21 13:25:59,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001944_995328.pth...
[2023-09-21 13:25:59,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001944_995328.pth...
[2023-09-21 13:25:59,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001568_802816.pth
[2023-09-21 13:25:59,298][52885] Saving new best policy, reward=9354.989!
[2023-09-21 13:25:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001568_802816.pth
[2023-09-21 13:26:03,240][52979] Updated weights for policy 1, policy_version 2000 (0.0013)
[2023-09-21 13:26:03,240][52980] Updated weights for policy 0, policy_version 2000 (0.0013)
[2023-09-21 13:26:04,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12970.7, 300 sec: 12800.0). Total num frames: 2056192. Throughput: 0: 6549.4, 1: 6547.3. Samples: 2047996. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:04,287][52220] Avg episode reward: [(0, '6922.979'), (1, '8869.455')]
[2023-09-21 13:26:09,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12809.3). Total num frames: 2121728. Throughput: 0: 6532.9, 1: 6534.0. Samples: 2127868. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:09,288][52220] Avg episode reward: [(0, '6832.589'), (1, '1018.921')]
[2023-09-21 13:26:09,399][52979] Updated weights for policy 1, policy_version 2080 (0.0015)
[2023-09-21 13:26:09,400][52980] Updated weights for policy 0, policy_version 2080 (0.0015)
[2023-09-21 13:26:14,286][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12818.1). Total num frames: 2187264. Throughput: 0: 6488.6, 1: 6489.9. Samples: 2165124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:14,287][52220] Avg episode reward: [(0, '6299.769'), (1, '2917.314')]
[2023-09-21 13:26:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002136_1093632.pth...
[2023-09-21 13:26:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002136_1093632.pth...
[2023-09-21 13:26:14,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001760_901120.pth
[2023-09-21 13:26:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001760_901120.pth
[2023-09-21 13:26:15,925][52980] Updated weights for policy 0, policy_version 2160 (0.0012)
[2023-09-21 13:26:15,925][52979] Updated weights for policy 1, policy_version 2160 (0.0015)
[2023-09-21 13:26:19,286][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12826.3). Total num frames: 2252800. Throughput: 0: 6467.7, 1: 6467.4. Samples: 2241444. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:26:19,287][52220] Avg episode reward: [(0, '417.647'), (1, '5806.103')]
[2023-09-21 13:26:22,247][52980] Updated weights for policy 0, policy_version 2240 (0.0014)
[2023-09-21 13:26:22,247][52979] Updated weights for policy 1, policy_version 2240 (0.0015)
[2023-09-21 13:26:24,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12834.1). Total num frames: 2318336. Throughput: 0: 6473.8, 1: 6476.3. Samples: 2319464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:24,288][52220] Avg episode reward: [(0, '674.690'), (1, '8673.284')]
[2023-09-21 13:26:28,391][52979] Updated weights for policy 1, policy_version 2320 (0.0014)
[2023-09-21 13:26:28,391][52980] Updated weights for policy 0, policy_version 2320 (0.0015)
[2023-09-21 13:26:29,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12841.5). Total num frames: 2383872. Throughput: 0: 6492.4, 1: 6492.4. Samples: 2359718. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:29,287][52220] Avg episode reward: [(0, '3037.838'), (1, '7623.096')]
[2023-09-21 13:26:29,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002328_1191936.pth...
[2023-09-21 13:26:29,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002328_1191936.pth...
[2023-09-21 13:26:29,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000001944_995328.pth
[2023-09-21 13:26:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000001944_995328.pth
[2023-09-21 13:26:34,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12848.5). Total num frames: 2449408. Throughput: 0: 6479.8, 1: 6480.2. Samples: 2437656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:34,287][52220] Avg episode reward: [(0, '5721.071'), (1, '463.553')]
[2023-09-21 13:26:34,774][52979] Updated weights for policy 1, policy_version 2400 (0.0014)
[2023-09-21 13:26:34,774][52980] Updated weights for policy 0, policy_version 2400 (0.0014)
[2023-09-21 13:26:39,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 12855.1). Total num frames: 2514944. Throughput: 0: 6461.5, 1: 6461.8. Samples: 2514930. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:39,287][52220] Avg episode reward: [(0, '6041.492'), (1, '619.276')]
[2023-09-21 13:26:41,066][52980] Updated weights for policy 0, policy_version 2480 (0.0013)
[2023-09-21 13:26:41,066][52979] Updated weights for policy 1, policy_version 2480 (0.0012)
[2023-09-21 13:26:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12861.4). Total num frames: 2580480. Throughput: 0: 6489.9, 1: 6487.6. Samples: 2555906. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:26:44,287][52220] Avg episode reward: [(0, '5855.563'), (1, '3436.696')]
[2023-09-21 13:26:44,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002520_1290240.pth...
[2023-09-21 13:26:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002520_1290240.pth...
[2023-09-21 13:26:44,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002136_1093632.pth
[2023-09-21 13:26:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002136_1093632.pth
[2023-09-21 13:26:47,194][52980] Updated weights for policy 0, policy_version 2560 (0.0013)
[2023-09-21 13:26:47,194][52979] Updated weights for policy 1, policy_version 2560 (0.0015)
[2023-09-21 13:26:49,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 12867.4). Total num frames: 2646016. Throughput: 0: 6546.0, 1: 6547.4. Samples: 2637204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:49,287][52220] Avg episode reward: [(0, '6457.100'), (1, '6204.589')]
[2023-09-21 13:26:53,289][52980] Updated weights for policy 0, policy_version 2640 (0.0014)
[2023-09-21 13:26:53,289][52979] Updated weights for policy 1, policy_version 2640 (0.0013)
[2023-09-21 13:26:54,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12873.1). Total num frames: 2711552. Throughput: 0: 6526.7, 1: 6526.7. Samples: 2715270. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:26:54,288][52220] Avg episode reward: [(0, '6807.248'), (1, '7389.904')]
[2023-09-21 13:26:59,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.1, 300 sec: 12878.6). Total num frames: 2777088. Throughput: 0: 6551.0, 1: 6551.5. Samples: 2754738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:26:59,288][52220] Avg episode reward: [(0, '6351.091'), (1, '7023.754')]
[2023-09-21 13:26:59,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002712_1388544.pth...
[2023-09-21 13:26:59,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002712_1388544.pth...
[2023-09-21 13:26:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002328_1191936.pth
[2023-09-21 13:26:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002328_1191936.pth
[2023-09-21 13:26:59,506][52980] Updated weights for policy 0, policy_version 2720 (0.0013)
[2023-09-21 13:26:59,506][52979] Updated weights for policy 1, policy_version 2720 (0.0013)
[2023-09-21 13:27:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.1, 300 sec: 12883.8). Total num frames: 2842624. Throughput: 0: 6555.4, 1: 6555.6. Samples: 2831442. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:27:04,288][52220] Avg episode reward: [(0, '6102.821'), (1, '6401.047')]
[2023-09-21 13:27:06,082][52980] Updated weights for policy 0, policy_version 2800 (0.0015)
[2023-09-21 13:27:06,082][52979] Updated weights for policy 1, policy_version 2800 (0.0014)
[2023-09-21 13:27:09,287][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12888.7). Total num frames: 2908160. Throughput: 0: 6544.5, 1: 6543.5. Samples: 2908426. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:27:09,288][52220] Avg episode reward: [(0, '5230.132'), (1, '5837.226')]
[2023-09-21 13:27:12,422][52980] Updated weights for policy 0, policy_version 2880 (0.0014)
[2023-09-21 13:27:12,424][52979] Updated weights for policy 1, policy_version 2880 (0.0014)
[2023-09-21 13:27:14,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12857.9). Total num frames: 2965504. Throughput: 0: 6520.6, 1: 6520.3. Samples: 2946558. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:27:14,287][52220] Avg episode reward: [(0, '1027.219'), (1, '2280.409')]
[2023-09-21 13:27:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002896_1482752.pth...
[2023-09-21 13:27:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002896_1482752.pth...
[2023-09-21 13:27:14,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002520_1290240.pth
[2023-09-21 13:27:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002520_1290240.pth
[2023-09-21 13:27:18,730][52980] Updated weights for policy 0, policy_version 2960 (0.0014)
[2023-09-21 13:27:18,730][52979] Updated weights for policy 1, policy_version 2960 (0.0015)
[2023-09-21 13:27:19,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.6, 300 sec: 12863.2). Total num frames: 3031040. Throughput: 0: 6506.2, 1: 6506.0. Samples: 3023208. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:27:19,287][52220] Avg episode reward: [(0, '2016.999'), (1, '1212.339')]
[2023-09-21 13:27:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12868.3). Total num frames: 3096576. Throughput: 0: 6541.0, 1: 6542.6. Samples: 3103690. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:27:24,287][52220] Avg episode reward: [(0, '743.100'), (1, '3084.323')]
[2023-09-21 13:27:24,974][52979] Updated weights for policy 1, policy_version 3040 (0.0013)
[2023-09-21 13:27:24,974][52980] Updated weights for policy 0, policy_version 3040 (0.0011)
[2023-09-21 13:27:29,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12873.1). Total num frames: 3162112. Throughput: 0: 6505.3, 1: 6507.4. Samples: 3141476. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:27:29,287][52220] Avg episode reward: [(0, '1835.633'), (1, '1015.822')]
[2023-09-21 13:27:29,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003088_1581056.pth...
[2023-09-21 13:27:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003088_1581056.pth...
[2023-09-21 13:27:29,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002712_1388544.pth
[2023-09-21 13:27:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002712_1388544.pth
[2023-09-21 13:27:31,482][52979] Updated weights for policy 1, policy_version 3120 (0.0016)
[2023-09-21 13:27:31,482][52980] Updated weights for policy 0, policy_version 3120 (0.0015)
[2023-09-21 13:27:34,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12877.8). Total num frames: 3227648. Throughput: 0: 6447.3, 1: 6447.9. Samples: 3217490. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:27:34,287][52220] Avg episode reward: [(0, '1079.976'), (1, '919.988')]
[2023-09-21 13:27:37,734][52980] Updated weights for policy 0, policy_version 3200 (0.0014)
[2023-09-21 13:27:37,734][52979] Updated weights for policy 1, policy_version 3200 (0.0016)
[2023-09-21 13:27:39,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12882.3). Total num frames: 3293184. Throughput: 0: 6436.5, 1: 6436.7. Samples: 3294560. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:27:39,287][52220] Avg episode reward: [(0, '815.120'), (1, '554.483')]
[2023-09-21 13:27:44,167][52980] Updated weights for policy 0, policy_version 3280 (0.0014)
[2023-09-21 13:27:44,167][52979] Updated weights for policy 1, policy_version 3280 (0.0014)
[2023-09-21 13:27:44,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12886.6). Total num frames: 3358720. Throughput: 0: 6432.0, 1: 6431.4. Samples: 3333588. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:27:44,288][52220] Avg episode reward: [(0, '1107.621'), (1, '3291.924')]
[2023-09-21 13:27:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003280_1679360.pth...
[2023-09-21 13:27:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003280_1679360.pth...
[2023-09-21 13:27:44,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000002896_1482752.pth
[2023-09-21 13:27:44,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000002896_1482752.pth
[2023-09-21 13:27:49,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12859.9). Total num frames: 3416064. Throughput: 0: 6431.5, 1: 6431.1. Samples: 3410258. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:27:49,288][52220] Avg episode reward: [(0, '1492.838'), (1, '4366.645')]
[2023-09-21 13:27:50,625][52980] Updated weights for policy 0, policy_version 3360 (0.0010)
[2023-09-21 13:27:50,625][52979] Updated weights for policy 1, policy_version 3360 (0.0012)
[2023-09-21 13:27:54,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12864.5). Total num frames: 3481600. Throughput: 0: 6446.5, 1: 6447.1. Samples: 3488638. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:27:54,287][52220] Avg episode reward: [(0, '2174.187'), (1, '5011.108')]
[2023-09-21 13:27:56,841][52980] Updated weights for policy 0, policy_version 3440 (0.0013)
[2023-09-21 13:27:56,841][52979] Updated weights for policy 1, policy_version 3440 (0.0012)
[2023-09-21 13:27:59,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12868.9). Total num frames: 3547136. Throughput: 0: 6460.1, 1: 6459.6. Samples: 3527942. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:27:59,287][52220] Avg episode reward: [(0, '467.350'), (1, '600.526')]
[2023-09-21 13:27:59,291][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003464_1773568.pth...
[2023-09-21 13:27:59,295][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003088_1581056.pth
[2023-09-21 13:27:59,337][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003472_1777664.pth...
[2023-09-21 13:27:59,340][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003088_1581056.pth
[2023-09-21 13:28:03,068][52980] Updated weights for policy 0, policy_version 3520 (0.0015)
[2023-09-21 13:28:03,068][52979] Updated weights for policy 1, policy_version 3520 (0.0013)
[2023-09-21 13:28:04,286][52220] Fps is (10 sec: 13926.5, 60 sec: 12970.7, 300 sec: 12902.4). Total num frames: 3620864. Throughput: 0: 6476.0, 1: 6476.1. Samples: 3606052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:28:04,287][52220] Avg episode reward: [(0, '1380.872'), (1, '3287.548')]
[2023-09-21 13:28:09,163][52980] Updated weights for policy 0, policy_version 3600 (0.0015)
[2023-09-21 13:28:09,163][52979] Updated weights for policy 1, policy_version 3600 (0.0012)
[2023-09-21 13:28:09,286][52220] Fps is (10 sec: 13926.3, 60 sec: 12970.7, 300 sec: 12906.0). Total num frames: 3686400. Throughput: 0: 6476.3, 1: 6474.4. Samples: 3686468. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:28:09,288][52220] Avg episode reward: [(0, '958.101'), (1, '5374.573')]
[2023-09-21 13:28:14,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12909.5). Total num frames: 3751936. Throughput: 0: 6506.9, 1: 6505.6. Samples: 3727042. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:28:14,288][52220] Avg episode reward: [(0, '828.671'), (1, '5105.236')]
[2023-09-21 13:28:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003664_1875968.pth...
[2023-09-21 13:28:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003664_1875968.pth...
[2023-09-21 13:28:14,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003280_1679360.pth
[2023-09-21 13:28:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003280_1679360.pth
[2023-09-21 13:28:15,450][52979] Updated weights for policy 1, policy_version 3680 (0.0014)
[2023-09-21 13:28:15,450][52980] Updated weights for policy 0, policy_version 3680 (0.0013)
[2023-09-21 13:28:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 3817472. Throughput: 0: 6547.3, 1: 6547.5. Samples: 3806760. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:28:19,287][52220] Avg episode reward: [(0, '927.280'), (1, '6190.526')]
[2023-09-21 13:28:21,753][52980] Updated weights for policy 0, policy_version 3760 (0.0012)
[2023-09-21 13:28:21,754][52979] Updated weights for policy 1, policy_version 3760 (0.0016)
[2023-09-21 13:28:24,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 3874816. Throughput: 0: 6534.3, 1: 6532.7. Samples: 3882572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:28:24,287][52220] Avg episode reward: [(0, '1593.569'), (1, '6727.270')]
[2023-09-21 13:28:28,119][52980] Updated weights for policy 0, policy_version 3840 (0.0013)
[2023-09-21 13:28:28,120][52979] Updated weights for policy 1, policy_version 3840 (0.0011)
[2023-09-21 13:28:29,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 3940352. Throughput: 0: 6511.5, 1: 6511.5. Samples: 3919624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:28:29,287][52220] Avg episode reward: [(0, '430.176'), (1, '6822.899')]
[2023-09-21 13:28:29,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003848_1970176.pth...
[2023-09-21 13:28:29,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003848_1970176.pth...
[2023-09-21 13:28:29,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003472_1777664.pth
[2023-09-21 13:28:29,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003464_1773568.pth
[2023-09-21 13:28:34,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 4005888. Throughput: 0: 6528.7, 1: 6526.9. Samples: 3997760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:28:34,287][52220] Avg episode reward: [(0, '1148.194'), (1, '6026.845')]
[2023-09-21 13:28:34,403][52979] Updated weights for policy 1, policy_version 3920 (0.0011)
[2023-09-21 13:28:34,403][52980] Updated weights for policy 0, policy_version 3920 (0.0015)
[2023-09-21 13:28:39,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12968.4). Total num frames: 4071424. Throughput: 0: 6515.7, 1: 6515.4. Samples: 4075036. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:28:39,288][52220] Avg episode reward: [(0, '285.021'), (1, '6219.559')]
[2023-09-21 13:28:40,922][52980] Updated weights for policy 0, policy_version 4000 (0.0014)
[2023-09-21 13:28:40,923][52979] Updated weights for policy 1, policy_version 4000 (0.0016)
[2023-09-21 13:28:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12968.4). Total num frames: 4136960. Throughput: 0: 6494.6, 1: 6493.8. Samples: 4112418. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:28:44,287][52220] Avg episode reward: [(0, '644.599'), (1, '6672.147')]
[2023-09-21 13:28:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004040_2068480.pth...
[2023-09-21 13:28:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004040_2068480.pth...
[2023-09-21 13:28:44,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003664_1875968.pth
[2023-09-21 13:28:44,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003664_1875968.pth
[2023-09-21 13:28:46,847][52980] Updated weights for policy 0, policy_version 4080 (0.0011)
[2023-09-21 13:28:46,848][52979] Updated weights for policy 1, policy_version 4080 (0.0013)
[2023-09-21 13:28:49,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 4202496. Throughput: 0: 6555.2, 1: 6554.8. Samples: 4196006. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:28:49,288][52220] Avg episode reward: [(0, '918.545'), (1, '7838.863')]
[2023-09-21 13:28:53,047][52979] Updated weights for policy 1, policy_version 4160 (0.0013)
[2023-09-21 13:28:53,048][52980] Updated weights for policy 0, policy_version 4160 (0.0012)
[2023-09-21 13:28:54,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 4268032. Throughput: 0: 6543.2, 1: 6545.3. Samples: 4275446. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:28:54,287][52220] Avg episode reward: [(0, '706.591'), (1, '8731.683')]
[2023-09-21 13:28:59,170][52979] Updated weights for policy 1, policy_version 4240 (0.0013)
[2023-09-21 13:28:59,170][52980] Updated weights for policy 0, policy_version 4240 (0.0012)
[2023-09-21 13:28:59,286][52220] Fps is (10 sec: 13926.6, 60 sec: 13243.7, 300 sec: 13023.9). Total num frames: 4341760. Throughput: 0: 6545.1, 1: 6546.6. Samples: 4316168. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:28:59,287][52220] Avg episode reward: [(0, '367.257'), (1, '9270.249')]
[2023-09-21 13:28:59,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004240_2170880.pth...
[2023-09-21 13:28:59,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004240_2170880.pth...
[2023-09-21 13:28:59,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000003848_1970176.pth
[2023-09-21 13:28:59,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000003848_1970176.pth
[2023-09-21 13:29:04,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 4399104. Throughput: 0: 6518.0, 1: 6518.1. Samples: 4393388. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:29:04,288][52220] Avg episode reward: [(0, '1165.006'), (1, '9356.618')]
[2023-09-21 13:29:04,298][52885] Saving new best policy, reward=9356.618!
[2023-09-21 13:29:05,674][52979] Updated weights for policy 1, policy_version 4320 (0.0013)
[2023-09-21 13:29:05,674][52980] Updated weights for policy 0, policy_version 4320 (0.0014)
[2023-09-21 13:29:09,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 4464640. Throughput: 0: 6485.0, 1: 6486.5. Samples: 4466292. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:29:09,288][52220] Avg episode reward: [(0, '3001.533'), (1, '9003.651')]
[2023-09-21 13:29:12,403][52980] Updated weights for policy 0, policy_version 4400 (0.0013)
[2023-09-21 13:29:12,403][52979] Updated weights for policy 1, policy_version 4400 (0.0013)
[2023-09-21 13:29:14,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 4530176. Throughput: 0: 6484.3, 1: 6483.9. Samples: 4503198. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:29:14,288][52220] Avg episode reward: [(0, '3265.888'), (1, '8376.869')]
[2023-09-21 13:29:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004424_2265088.pth...
[2023-09-21 13:29:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004424_2265088.pth...
[2023-09-21 13:29:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004040_2068480.pth
[2023-09-21 13:29:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004040_2068480.pth
[2023-09-21 13:29:18,776][52980] Updated weights for policy 0, policy_version 4480 (0.0013)
[2023-09-21 13:29:18,777][52979] Updated weights for policy 1, policy_version 4480 (0.0014)
[2023-09-21 13:29:19,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12834.2, 300 sec: 12968.4). Total num frames: 4587520. Throughput: 0: 6469.4, 1: 6470.9. Samples: 4580072. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:29:19,287][52220] Avg episode reward: [(0, '3622.771'), (1, '6926.232')]
[2023-09-21 13:29:24,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.6, 300 sec: 12968.4). Total num frames: 4653056. Throughput: 0: 6488.6, 1: 6489.2. Samples: 4659038. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:29:24,288][52220] Avg episode reward: [(0, '3888.915'), (1, '6733.320')]
[2023-09-21 13:29:24,975][52979] Updated weights for policy 1, policy_version 4560 (0.0014)
[2023-09-21 13:29:24,975][52980] Updated weights for policy 0, policy_version 4560 (0.0015)
[2023-09-21 13:29:29,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 12968.3). Total num frames: 4718592. Throughput: 0: 6528.4, 1: 6529.7. Samples: 4700036. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:29:29,287][52220] Avg episode reward: [(0, '4531.947'), (1, '7272.334')]
[2023-09-21 13:29:29,335][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004616_2363392.pth...
[2023-09-21 13:29:29,338][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004240_2170880.pth
[2023-09-21 13:29:29,343][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004616_2363392.pth...
[2023-09-21 13:29:29,347][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004240_2170880.pth
[2023-09-21 13:29:31,071][52980] Updated weights for policy 0, policy_version 4640 (0.0015)
[2023-09-21 13:29:31,071][52979] Updated weights for policy 1, policy_version 4640 (0.0011)
[2023-09-21 13:29:34,287][52220] Fps is (10 sec: 13926.4, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 4792320. Throughput: 0: 6489.5, 1: 6489.9. Samples: 4780080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:29:34,287][52220] Avg episode reward: [(0, '3506.421'), (1, '7730.738')]
[2023-09-21 13:29:37,116][52980] Updated weights for policy 0, policy_version 4720 (0.0014)
[2023-09-21 13:29:37,116][52979] Updated weights for policy 1, policy_version 4720 (0.0014)
[2023-09-21 13:29:39,286][52220] Fps is (10 sec: 13926.7, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 4857856. Throughput: 0: 6523.8, 1: 6524.4. Samples: 4862616. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:29:39,287][52220] Avg episode reward: [(0, '3920.986'), (1, '3772.676')]
[2023-09-21 13:29:42,959][52980] Updated weights for policy 0, policy_version 4800 (0.0014)
[2023-09-21 13:29:42,959][52979] Updated weights for policy 1, policy_version 4800 (0.0015)
[2023-09-21 13:29:44,286][52220] Fps is (10 sec: 13926.5, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 4931584. Throughput: 0: 6553.8, 1: 6553.1. Samples: 4905978. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:29:44,287][52220] Avg episode reward: [(0, '4582.768'), (1, '1280.545')]
[2023-09-21 13:29:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004816_2465792.pth...
[2023-09-21 13:29:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004816_2465792.pth...
[2023-09-21 13:29:44,306][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004424_2265088.pth
[2023-09-21 13:29:44,306][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004424_2265088.pth
[2023-09-21 13:29:49,227][52980] Updated weights for policy 0, policy_version 4880 (0.0012)
[2023-09-21 13:29:49,227][52979] Updated weights for policy 1, policy_version 4880 (0.0012)
[2023-09-21 13:29:49,286][52220] Fps is (10 sec: 13926.3, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 4997120. Throughput: 0: 6577.6, 1: 6576.7. Samples: 4985328. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:29:49,287][52220] Avg episode reward: [(0, '350.718'), (1, '1158.513')]
[2023-09-21 13:29:54,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 5062656. Throughput: 0: 6628.2, 1: 6625.8. Samples: 5062720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:29:54,287][52220] Avg episode reward: [(0, '1454.240'), (1, '377.826')]
[2023-09-21 13:29:55,414][52980] Updated weights for policy 0, policy_version 4960 (0.0011)
[2023-09-21 13:29:55,416][52979] Updated weights for policy 1, policy_version 4960 (0.0013)
[2023-09-21 13:29:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 5128192. Throughput: 0: 6669.3, 1: 6668.4. Samples: 5103396. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:29:59,288][52220] Avg episode reward: [(0, '1881.211'), (1, '516.736')]
[2023-09-21 13:29:59,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005008_2564096.pth...
[2023-09-21 13:29:59,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005008_2564096.pth...
[2023-09-21 13:29:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004616_2363392.pth
[2023-09-21 13:29:59,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004616_2363392.pth
[2023-09-21 13:30:01,428][52980] Updated weights for policy 0, policy_version 5040 (0.0014)
[2023-09-21 13:30:01,428][52979] Updated weights for policy 1, policy_version 5040 (0.0015)
[2023-09-21 13:30:04,286][52220] Fps is (10 sec: 13516.6, 60 sec: 13312.0, 300 sec: 13065.5). Total num frames: 5197824. Throughput: 0: 6729.2, 1: 6729.8. Samples: 5185728. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:30:04,287][52220] Avg episode reward: [(0, '294.281'), (1, '2939.865')]
[2023-09-21 13:30:07,369][52980] Updated weights for policy 0, policy_version 5120 (0.0012)
[2023-09-21 13:30:07,370][52979] Updated weights for policy 1, policy_version 5120 (0.0014)
[2023-09-21 13:30:09,286][52220] Fps is (10 sec: 13926.7, 60 sec: 13380.3, 300 sec: 13079.4). Total num frames: 5267456. Throughput: 0: 6762.1, 1: 6759.8. Samples: 5267520. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:30:09,287][52220] Avg episode reward: [(0, '824.120'), (1, '4656.765')]
[2023-09-21 13:30:13,682][52979] Updated weights for policy 1, policy_version 5200 (0.0013)
[2023-09-21 13:30:13,683][52980] Updated weights for policy 0, policy_version 5200 (0.0014)
[2023-09-21 13:30:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13312.0, 300 sec: 13093.3). Total num frames: 5328896. Throughput: 0: 6705.6, 1: 6706.1. Samples: 5303564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:30:14,288][52220] Avg episode reward: [(0, '444.057'), (1, '5805.610')]
[2023-09-21 13:30:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005208_2666496.pth...
[2023-09-21 13:30:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005208_2666496.pth...
[2023-09-21 13:30:14,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000004816_2465792.pth
[2023-09-21 13:30:14,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000004816_2465792.pth
[2023-09-21 13:30:19,286][52220] Fps is (10 sec: 12287.9, 60 sec: 13380.3, 300 sec: 13051.7). Total num frames: 5390336. Throughput: 0: 6702.2, 1: 6701.1. Samples: 5383228. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:30:19,287][52220] Avg episode reward: [(0, '376.089'), (1, '4874.545')]
[2023-09-21 13:30:20,037][52979] Updated weights for policy 1, policy_version 5280 (0.0012)
[2023-09-21 13:30:20,037][52980] Updated weights for policy 0, policy_version 5280 (0.0014)
[2023-09-21 13:30:24,286][52220] Fps is (10 sec: 12697.7, 60 sec: 13380.3, 300 sec: 13051.7). Total num frames: 5455872. Throughput: 0: 6656.3, 1: 6655.6. Samples: 5461656. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:30:24,287][52220] Avg episode reward: [(0, '578.371'), (1, '3591.697')]
[2023-09-21 13:30:26,486][52979] Updated weights for policy 1, policy_version 5360 (0.0014)
[2023-09-21 13:30:26,487][52980] Updated weights for policy 0, policy_version 5360 (0.0015)
[2023-09-21 13:30:29,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 13051.7). Total num frames: 5521408. Throughput: 0: 6565.9, 1: 6564.9. Samples: 5496866. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:30:29,287][52220] Avg episode reward: [(0, '365.739'), (1, '4877.659')]
[2023-09-21 13:30:29,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005392_2760704.pth...
[2023-09-21 13:30:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005392_2760704.pth...
[2023-09-21 13:30:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005008_2564096.pth
[2023-09-21 13:30:29,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005008_2564096.pth
[2023-09-21 13:30:32,916][52979] Updated weights for policy 1, policy_version 5440 (0.0012)
[2023-09-21 13:30:32,917][52980] Updated weights for policy 0, policy_version 5440 (0.0015)
[2023-09-21 13:30:34,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 5586944. Throughput: 0: 6545.7, 1: 6546.7. Samples: 5574486. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:30:34,287][52220] Avg episode reward: [(0, '372.220'), (1, '1004.327')]
[2023-09-21 13:30:39,286][52220] Fps is (10 sec: 12287.9, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 5644288. Throughput: 0: 6526.7, 1: 6528.9. Samples: 5650224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:30:39,287][52220] Avg episode reward: [(0, '335.218'), (1, '2058.538')]
[2023-09-21 13:30:39,406][52979] Updated weights for policy 1, policy_version 5520 (0.0011)
[2023-09-21 13:30:39,407][52980] Updated weights for policy 0, policy_version 5520 (0.0015)
[2023-09-21 13:30:44,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 5709824. Throughput: 0: 6490.5, 1: 6491.4. Samples: 5687580. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:30:44,287][52220] Avg episode reward: [(0, '480.115'), (1, '3705.854')]
[2023-09-21 13:30:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005576_2854912.pth...
[2023-09-21 13:30:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005576_2854912.pth...
[2023-09-21 13:30:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005208_2666496.pth
[2023-09-21 13:30:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005208_2666496.pth
[2023-09-21 13:30:45,754][52980] Updated weights for policy 0, policy_version 5600 (0.0015)
[2023-09-21 13:30:45,754][52979] Updated weights for policy 1, policy_version 5600 (0.0013)
[2023-09-21 13:30:49,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 5775360. Throughput: 0: 6443.5, 1: 6443.0. Samples: 5765616. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:30:49,287][52220] Avg episode reward: [(0, '184.421'), (1, '5297.616')]
[2023-09-21 13:30:52,259][52979] Updated weights for policy 1, policy_version 5680 (0.0010)
[2023-09-21 13:30:52,259][52980] Updated weights for policy 0, policy_version 5680 (0.0014)
[2023-09-21 13:30:54,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 5840896. Throughput: 0: 6370.7, 1: 6370.9. Samples: 5840892. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:30:54,287][52220] Avg episode reward: [(0, '306.843'), (1, '6361.076')]
[2023-09-21 13:30:58,840][52980] Updated weights for policy 0, policy_version 5760 (0.0013)
[2023-09-21 13:30:58,841][52979] Updated weights for policy 1, policy_version 5760 (0.0013)
[2023-09-21 13:30:59,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 5898240. Throughput: 0: 6380.6, 1: 6379.5. Samples: 5877764. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:30:59,287][52220] Avg episode reward: [(0, '69.112'), (1, '6973.074')]
[2023-09-21 13:30:59,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005760_2949120.pth...
[2023-09-21 13:30:59,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005760_2949120.pth...
[2023-09-21 13:30:59,297][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005392_2760704.pth
[2023-09-21 13:30:59,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005392_2760704.pth
[2023-09-21 13:31:04,287][52220] Fps is (10 sec: 12287.7, 60 sec: 12765.9, 300 sec: 13023.9). Total num frames: 5963776. Throughput: 0: 6306.4, 1: 6307.1. Samples: 5950838. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:04,287][52220] Avg episode reward: [(0, '914.490'), (1, '6972.267')]
[2023-09-21 13:31:05,449][52979] Updated weights for policy 1, policy_version 5840 (0.0014)
[2023-09-21 13:31:05,450][52980] Updated weights for policy 0, policy_version 5840 (0.0015)
[2023-09-21 13:31:09,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12697.6, 300 sec: 13023.9). Total num frames: 6029312. Throughput: 0: 6315.0, 1: 6315.2. Samples: 6030016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:09,287][52220] Avg episode reward: [(0, '85.445'), (1, '7152.617')]
[2023-09-21 13:31:11,738][52979] Updated weights for policy 1, policy_version 5920 (0.0013)
[2023-09-21 13:31:11,738][52980] Updated weights for policy 0, policy_version 5920 (0.0013)
[2023-09-21 13:31:14,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12629.3, 300 sec: 12996.1). Total num frames: 6086656. Throughput: 0: 6354.4, 1: 6355.8. Samples: 6068826. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:14,288][52220] Avg episode reward: [(0, '1651.748'), (1, '6784.776')]
[2023-09-21 13:31:14,313][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005952_3047424.pth...
[2023-09-21 13:31:14,316][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005576_2854912.pth
[2023-09-21 13:31:14,318][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005952_3047424.pth...
[2023-09-21 13:31:14,321][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005576_2854912.pth
[2023-09-21 13:31:17,748][52979] Updated weights for policy 1, policy_version 6000 (0.0014)
[2023-09-21 13:31:17,748][52980] Updated weights for policy 0, policy_version 6000 (0.0013)
[2023-09-21 13:31:19,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 6160384. Throughput: 0: 6401.9, 1: 6401.6. Samples: 6150644. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:31:19,287][52220] Avg episode reward: [(0, '413.373'), (1, '6414.272')]
[2023-09-21 13:31:23,981][52979] Updated weights for policy 1, policy_version 6080 (0.0013)
[2023-09-21 13:31:23,982][52980] Updated weights for policy 0, policy_version 6080 (0.0014)
[2023-09-21 13:31:24,286][52220] Fps is (10 sec: 13926.7, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 6225920. Throughput: 0: 6422.3, 1: 6421.6. Samples: 6228200. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:31:24,287][52220] Avg episode reward: [(0, '355.671'), (1, '6870.944')]
[2023-09-21 13:31:29,286][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 6291456. Throughput: 0: 6437.4, 1: 6436.0. Samples: 6266880. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:31:29,287][52220] Avg episode reward: [(0, '113.754'), (1, '7331.743')]
[2023-09-21 13:31:29,291][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006144_3145728.pth...
[2023-09-21 13:31:29,291][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006144_3145728.pth...
[2023-09-21 13:31:29,294][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005760_2949120.pth
[2023-09-21 13:31:29,296][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005760_2949120.pth
[2023-09-21 13:31:30,453][52979] Updated weights for policy 1, policy_version 6160 (0.0014)
[2023-09-21 13:31:30,454][52980] Updated weights for policy 0, policy_version 6160 (0.0015)
[2023-09-21 13:31:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 6356992. Throughput: 0: 6416.6, 1: 6417.7. Samples: 6343160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:34,287][52220] Avg episode reward: [(0, '54.877'), (1, '8621.272')]
[2023-09-21 13:31:36,645][52979] Updated weights for policy 1, policy_version 6240 (0.0011)
[2023-09-21 13:31:36,646][52980] Updated weights for policy 0, policy_version 6240 (0.0015)
[2023-09-21 13:31:39,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 6422528. Throughput: 0: 6462.8, 1: 6463.2. Samples: 6422564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:39,287][52220] Avg episode reward: [(0, '2101.810'), (1, '9082.155')]
[2023-09-21 13:31:43,038][52980] Updated weights for policy 0, policy_version 6320 (0.0015)
[2023-09-21 13:31:43,039][52979] Updated weights for policy 1, policy_version 6320 (0.0014)
[2023-09-21 13:31:44,287][52220] Fps is (10 sec: 12287.7, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 6479872. Throughput: 0: 6474.8, 1: 6475.7. Samples: 6460540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:44,288][52220] Avg episode reward: [(0, '2284.566'), (1, '9265.721')]
[2023-09-21 13:31:44,315][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006336_3244032.pth...
[2023-09-21 13:31:44,318][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000005952_3047424.pth
[2023-09-21 13:31:44,319][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006336_3244032.pth...
[2023-09-21 13:31:44,322][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000005952_3047424.pth
[2023-09-21 13:31:49,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 6545408. Throughput: 0: 6517.1, 1: 6515.3. Samples: 6537296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:49,287][52220] Avg episode reward: [(0, '2004.841'), (1, '9265.750')]
[2023-09-21 13:31:49,456][52979] Updated weights for policy 1, policy_version 6400 (0.0014)
[2023-09-21 13:31:49,456][52980] Updated weights for policy 0, policy_version 6400 (0.0016)
[2023-09-21 13:31:54,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 6610944. Throughput: 0: 6499.7, 1: 6499.5. Samples: 6614976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:54,287][52220] Avg episode reward: [(0, '3119.889'), (1, '9082.109')]
[2023-09-21 13:31:55,681][52979] Updated weights for policy 1, policy_version 6480 (0.0013)
[2023-09-21 13:31:55,681][52980] Updated weights for policy 0, policy_version 6480 (0.0010)
[2023-09-21 13:31:59,287][52220] Fps is (10 sec: 13925.9, 60 sec: 13107.1, 300 sec: 13023.9). Total num frames: 6684672. Throughput: 0: 6568.6, 1: 6567.9. Samples: 6659966. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:31:59,288][52220] Avg episode reward: [(0, '5542.930'), (1, '8899.402')]
[2023-09-21 13:31:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006528_3342336.pth...
[2023-09-21 13:31:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006528_3342336.pth...
[2023-09-21 13:31:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006144_3145728.pth
[2023-09-21 13:31:59,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006144_3145728.pth
[2023-09-21 13:32:01,656][52980] Updated weights for policy 0, policy_version 6560 (0.0013)
[2023-09-21 13:32:01,656][52979] Updated weights for policy 1, policy_version 6560 (0.0014)
[2023-09-21 13:32:04,286][52220] Fps is (10 sec: 13926.4, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 6750208. Throughput: 0: 6538.2, 1: 6538.6. Samples: 6739100. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:04,287][52220] Avg episode reward: [(0, '5077.679'), (1, '8902.310')]
[2023-09-21 13:32:07,582][52979] Updated weights for policy 1, policy_version 6640 (0.0015)
[2023-09-21 13:32:07,582][52980] Updated weights for policy 0, policy_version 6640 (0.0015)
[2023-09-21 13:32:09,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 6815744. Throughput: 0: 6563.0, 1: 6563.8. Samples: 6818904. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:09,287][52220] Avg episode reward: [(0, '5082.925'), (1, '8460.693')]
[2023-09-21 13:32:13,354][52884] KL-divergence is very high: 114.0149
[2023-09-21 13:32:13,359][52884] KL-divergence is very high: 121.4593
[2023-09-21 13:32:14,015][52979] Updated weights for policy 1, policy_version 6720 (0.0013)
[2023-09-21 13:32:14,015][52980] Updated weights for policy 0, policy_version 6720 (0.0011)
[2023-09-21 13:32:14,287][52220] Fps is (10 sec: 13106.7, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 6881280. Throughput: 0: 6559.7, 1: 6562.1. Samples: 6857364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:14,288][52220] Avg episode reward: [(0, '5185.753'), (1, '8294.328')]
[2023-09-21 13:32:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006720_3440640.pth...
[2023-09-21 13:32:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006720_3440640.pth...
[2023-09-21 13:32:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006336_3244032.pth
[2023-09-21 13:32:14,308][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006336_3244032.pth
[2023-09-21 13:32:19,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 6946816. Throughput: 0: 6617.5, 1: 6615.0. Samples: 6938624. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:19,287][52220] Avg episode reward: [(0, '5546.651'), (1, '7885.490')]
[2023-09-21 13:32:20,121][52979] Updated weights for policy 1, policy_version 6800 (0.0014)
[2023-09-21 13:32:20,121][52980] Updated weights for policy 0, policy_version 6800 (0.0013)
[2023-09-21 13:32:24,287][52220] Fps is (10 sec: 13926.6, 60 sec: 13243.7, 300 sec: 13079.4). Total num frames: 7020544. Throughput: 0: 6644.5, 1: 6643.9. Samples: 7020546. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:24,287][52220] Avg episode reward: [(0, '5820.198'), (1, '7771.782')]
[2023-09-21 13:32:26,116][52980] Updated weights for policy 0, policy_version 6880 (0.0012)
[2023-09-21 13:32:26,116][52979] Updated weights for policy 1, policy_version 6880 (0.0012)
[2023-09-21 13:32:28,055][52884] KL-divergence is very high: 174.7079
[2023-09-21 13:32:28,060][52884] KL-divergence is very high: 127.8993
[2023-09-21 13:32:28,752][52884] KL-divergence is very high: 108.4238
[2023-09-21 13:32:29,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7077888. Throughput: 0: 6638.7, 1: 6638.4. Samples: 7058014. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:29,288][52220] Avg episode reward: [(0, '6021.697'), (1, '6924.544')]
[2023-09-21 13:32:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006912_3538944.pth...
[2023-09-21 13:32:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006912_3538944.pth...
[2023-09-21 13:32:29,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006528_3342336.pth
[2023-09-21 13:32:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006528_3342336.pth
[2023-09-21 13:32:32,786][52980] Updated weights for policy 0, policy_version 6960 (0.0017)
[2023-09-21 13:32:32,786][52979] Updated weights for policy 1, policy_version 6960 (0.0016)
[2023-09-21 13:32:34,043][52885] KL-divergence is very high: 162.5900
[2023-09-21 13:32:34,286][52220] Fps is (10 sec: 12288.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7143424. Throughput: 0: 6608.1, 1: 6609.9. Samples: 7132108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:34,287][52220] Avg episode reward: [(0, '5944.587'), (1, '6984.766')]
[2023-09-21 13:32:38,501][52885] KL-divergence is very high: 112.2562
[2023-09-21 13:32:38,506][52885] KL-divergence is very high: 182.1912
[2023-09-21 13:32:38,517][52885] KL-divergence is very high: 237.8661
[2023-09-21 13:32:39,123][52884] KL-divergence is very high: 111.6947
[2023-09-21 13:32:39,128][52884] KL-divergence is very high: 195.2763
[2023-09-21 13:32:39,143][52979] Updated weights for policy 1, policy_version 7040 (0.0013)
[2023-09-21 13:32:39,144][52980] Updated weights for policy 0, policy_version 7040 (0.0014)
[2023-09-21 13:32:39,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7208960. Throughput: 0: 6602.2, 1: 6601.8. Samples: 7209156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:39,287][52220] Avg episode reward: [(0, '4853.285'), (1, '6706.128')]
[2023-09-21 13:32:44,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13243.7, 300 sec: 13079.4). Total num frames: 7274496. Throughput: 0: 6555.0, 1: 6554.1. Samples: 7249874. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:44,288][52220] Avg episode reward: [(0, '1385.633'), (1, '6167.174')]
[2023-09-21 13:32:44,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007104_3637248.pth...
[2023-09-21 13:32:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007104_3637248.pth...
[2023-09-21 13:32:44,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006720_3440640.pth
[2023-09-21 13:32:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006720_3440640.pth
[2023-09-21 13:32:45,396][52980] Updated weights for policy 0, policy_version 7120 (0.0014)
[2023-09-21 13:32:45,396][52979] Updated weights for policy 1, policy_version 7120 (0.0016)
[2023-09-21 13:32:49,286][52220] Fps is (10 sec: 12287.9, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7331840. Throughput: 0: 6499.0, 1: 6498.0. Samples: 7323970. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:32:49,288][52220] Avg episode reward: [(0, '1190.850'), (1, '5245.301')]
[2023-09-21 13:32:51,877][52980] Updated weights for policy 0, policy_version 7200 (0.0014)
[2023-09-21 13:32:51,877][52979] Updated weights for policy 1, policy_version 7200 (0.0016)
[2023-09-21 13:32:54,286][52220] Fps is (10 sec: 12288.3, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 7397376. Throughput: 0: 6472.2, 1: 6471.4. Samples: 7401364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:54,287][52220] Avg episode reward: [(0, '1688.535'), (1, '3485.425')]
[2023-09-21 13:32:58,292][52979] Updated weights for policy 1, policy_version 7280 (0.0013)
[2023-09-21 13:32:58,294][52980] Updated weights for policy 0, policy_version 7280 (0.0015)
[2023-09-21 13:32:59,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 7462912. Throughput: 0: 6470.2, 1: 6470.1. Samples: 7439676. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:32:59,287][52220] Avg episode reward: [(0, '428.059'), (1, '3129.384')]
[2023-09-21 13:32:59,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007288_3731456.pth...
[2023-09-21 13:32:59,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007288_3731456.pth...
[2023-09-21 13:32:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000006912_3538944.pth
[2023-09-21 13:32:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000006912_3538944.pth
[2023-09-21 13:33:04,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 13023.9). Total num frames: 7528448. Throughput: 0: 6456.7, 1: 6457.7. Samples: 7519776. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:33:04,288][52220] Avg episode reward: [(0, '1320.466'), (1, '2583.518')]
[2023-09-21 13:33:04,644][52979] Updated weights for policy 1, policy_version 7360 (0.0013)
[2023-09-21 13:33:04,646][52980] Updated weights for policy 0, policy_version 7360 (0.0016)
[2023-09-21 13:33:09,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 7585792. Throughput: 0: 6362.0, 1: 6364.0. Samples: 7593220. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:33:09,287][52220] Avg episode reward: [(0, '3285.713'), (1, '4140.794')]
[2023-09-21 13:33:11,235][52979] Updated weights for policy 1, policy_version 7440 (0.0014)
[2023-09-21 13:33:11,236][52980] Updated weights for policy 0, policy_version 7440 (0.0013)
[2023-09-21 13:33:14,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 7659520. Throughput: 0: 6372.5, 1: 6373.0. Samples: 7631556. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:33:14,287][52220] Avg episode reward: [(0, '327.184'), (1, '5978.137')]
[2023-09-21 13:33:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007480_3829760.pth...
[2023-09-21 13:33:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007480_3829760.pth...
[2023-09-21 13:33:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007104_3637248.pth
[2023-09-21 13:33:14,323][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007104_3637248.pth
[2023-09-21 13:33:17,369][52980] Updated weights for policy 0, policy_version 7520 (0.0014)
[2023-09-21 13:33:17,370][52979] Updated weights for policy 1, policy_version 7520 (0.0013)
[2023-09-21 13:33:19,112][52885] KL-divergence is very high: 127.2461
[2023-09-21 13:33:19,121][52885] KL-divergence is very high: 160.4157
[2023-09-21 13:33:19,125][52885] KL-divergence is very high: 220.9313
[2023-09-21 13:33:19,130][52885] KL-divergence is very high: 173.6443
[2023-09-21 13:33:19,286][52220] Fps is (10 sec: 13926.5, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 7725056. Throughput: 0: 6457.0, 1: 6456.9. Samples: 7713232. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:33:19,287][52220] Avg episode reward: [(0, '1051.801'), (1, '6242.662')]
[2023-09-21 13:33:23,620][52979] Updated weights for policy 1, policy_version 7600 (0.0015)
[2023-09-21 13:33:23,620][52980] Updated weights for policy 0, policy_version 7600 (0.0014)
[2023-09-21 13:33:24,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 13051.7). Total num frames: 7790592. Throughput: 0: 6458.1, 1: 6457.9. Samples: 7790376. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:33:24,287][52220] Avg episode reward: [(0, '948.608'), (1, '5786.479')]
[2023-09-21 13:33:29,287][52220] Fps is (10 sec: 13106.8, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 7856128. Throughput: 0: 6453.8, 1: 6455.7. Samples: 7830804. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:33:29,288][52220] Avg episode reward: [(0, '850.104'), (1, '5709.365')]
[2023-09-21 13:33:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007672_3928064.pth...
[2023-09-21 13:33:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007672_3928064.pth...
[2023-09-21 13:33:29,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007288_3731456.pth
[2023-09-21 13:33:29,308][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007288_3731456.pth
[2023-09-21 13:33:29,739][52979] Updated weights for policy 1, policy_version 7680 (0.0012)
[2023-09-21 13:33:29,740][52980] Updated weights for policy 0, policy_version 7680 (0.0013)
[2023-09-21 13:33:34,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 7921664. Throughput: 0: 6545.9, 1: 6546.1. Samples: 7913108. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:33:34,287][52220] Avg episode reward: [(0, '1505.381'), (1, '4029.727')]
[2023-09-21 13:33:35,672][52980] Updated weights for policy 0, policy_version 7760 (0.0014)
[2023-09-21 13:33:35,673][52979] Updated weights for policy 1, policy_version 7760 (0.0014)
[2023-09-21 13:33:39,287][52220] Fps is (10 sec: 13926.6, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 7995392. Throughput: 0: 6601.0, 1: 6599.6. Samples: 7995392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:33:39,287][52220] Avg episode reward: [(0, '3677.659'), (1, '625.058')]
[2023-09-21 13:33:41,818][52979] Updated weights for policy 1, policy_version 7840 (0.0013)
[2023-09-21 13:33:41,819][52980] Updated weights for policy 0, policy_version 7840 (0.0014)
[2023-09-21 13:33:44,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12970.6, 300 sec: 13051.6). Total num frames: 8052736. Throughput: 0: 6597.9, 1: 6597.9. Samples: 8033494. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:33:44,288][52220] Avg episode reward: [(0, '6019.572'), (1, '632.349')]
[2023-09-21 13:33:44,340][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007872_4030464.pth...
[2023-09-21 13:33:44,343][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007480_3829760.pth
[2023-09-21 13:33:44,346][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007872_4030464.pth...
[2023-09-21 13:33:44,349][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007480_3829760.pth
[2023-09-21 13:33:48,318][52979] Updated weights for policy 1, policy_version 7920 (0.0011)
[2023-09-21 13:33:48,319][52980] Updated weights for policy 0, policy_version 7920 (0.0016)
[2023-09-21 13:33:49,286][52220] Fps is (10 sec: 12288.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 8118272. Throughput: 0: 6549.2, 1: 6549.5. Samples: 8109216. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:33:49,287][52220] Avg episode reward: [(0, '6462.962'), (1, '653.049')]
[2023-09-21 13:33:54,287][52220] Fps is (10 sec: 13107.7, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 8183808. Throughput: 0: 6578.4, 1: 6577.8. Samples: 8185248. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:33:54,288][52220] Avg episode reward: [(0, '6169.284'), (1, '937.395')]
[2023-09-21 13:33:54,732][52979] Updated weights for policy 1, policy_version 8000 (0.0013)
[2023-09-21 13:33:54,732][52980] Updated weights for policy 0, policy_version 8000 (0.0014)
[2023-09-21 13:33:59,286][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 8249344. Throughput: 0: 6593.2, 1: 6590.7. Samples: 8224832. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:33:59,287][52220] Avg episode reward: [(0, '6363.465'), (1, '2655.908')]
[2023-09-21 13:33:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008056_4124672.pth...
[2023-09-21 13:33:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008056_4124672.pth...
[2023-09-21 13:33:59,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007672_3928064.pth
[2023-09-21 13:33:59,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007672_3928064.pth
[2023-09-21 13:34:00,953][52980] Updated weights for policy 0, policy_version 8080 (0.0014)
[2023-09-21 13:34:00,953][52979] Updated weights for policy 1, policy_version 8080 (0.0017)
[2023-09-21 13:34:04,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 8314880. Throughput: 0: 6567.0, 1: 6567.3. Samples: 8304278. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:34:04,287][52220] Avg episode reward: [(0, '6284.006'), (1, '3940.957')]
[2023-09-21 13:34:07,249][52979] Updated weights for policy 1, policy_version 8160 (0.0012)
[2023-09-21 13:34:07,250][52980] Updated weights for policy 0, policy_version 8160 (0.0012)
[2023-09-21 13:34:09,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13243.7, 300 sec: 13051.7). Total num frames: 8380416. Throughput: 0: 6565.4, 1: 6566.1. Samples: 8381298. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:34:09,288][52220] Avg episode reward: [(0, '6191.470'), (1, '3735.090')]
[2023-09-21 13:34:13,319][52979] Updated weights for policy 1, policy_version 8240 (0.0011)
[2023-09-21 13:34:13,320][52980] Updated weights for policy 0, policy_version 8240 (0.0015)
[2023-09-21 13:34:14,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 8445952. Throughput: 0: 6568.1, 1: 6568.2. Samples: 8421936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:34:14,287][52220] Avg episode reward: [(0, '5728.107'), (1, '5522.777')]
[2023-09-21 13:34:14,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008248_4222976.pth...
[2023-09-21 13:34:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008248_4222976.pth...
[2023-09-21 13:34:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000007872_4030464.pth
[2023-09-21 13:34:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000007872_4030464.pth
[2023-09-21 13:34:19,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 8511488. Throughput: 0: 6558.5, 1: 6557.5. Samples: 8503330. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:34:19,287][52220] Avg episode reward: [(0, '6006.540'), (1, '7048.366')]
[2023-09-21 13:34:19,511][52980] Updated weights for policy 0, policy_version 8320 (0.0015)
[2023-09-21 13:34:19,511][52979] Updated weights for policy 1, policy_version 8320 (0.0015)
[2023-09-21 13:34:24,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 8577024. Throughput: 0: 6460.6, 1: 6461.9. Samples: 8576902. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:34:24,287][52220] Avg episode reward: [(0, '6099.005'), (1, '795.161')]
[2023-09-21 13:34:26,332][52980] Updated weights for policy 0, policy_version 8400 (0.0013)
[2023-09-21 13:34:26,332][52979] Updated weights for policy 1, policy_version 8400 (0.0015)
[2023-09-21 13:34:29,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12970.8, 300 sec: 13023.9). Total num frames: 8634368. Throughput: 0: 6428.6, 1: 6428.5. Samples: 8612056. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:34:29,287][52220] Avg episode reward: [(0, '6844.153'), (1, '1821.965')]
[2023-09-21 13:34:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008432_4317184.pth...
[2023-09-21 13:34:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008432_4317184.pth...
[2023-09-21 13:34:29,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008056_4124672.pth
[2023-09-21 13:34:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008056_4124672.pth
[2023-09-21 13:34:32,574][52979] Updated weights for policy 1, policy_version 8480 (0.0014)
[2023-09-21 13:34:32,575][52980] Updated weights for policy 0, policy_version 8480 (0.0015)
[2023-09-21 13:34:34,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 8699904. Throughput: 0: 6473.7, 1: 6474.0. Samples: 8691864. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:34:34,287][52220] Avg episode reward: [(0, '7216.463'), (1, '1158.496')]
[2023-09-21 13:34:38,729][52979] Updated weights for policy 1, policy_version 8560 (0.0011)
[2023-09-21 13:34:38,731][52980] Updated weights for policy 0, policy_version 8560 (0.0013)
[2023-09-21 13:34:39,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 8765440. Throughput: 0: 6524.3, 1: 6525.4. Samples: 8772482. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:34:39,287][52220] Avg episode reward: [(0, '8054.671'), (1, '784.632')]
[2023-09-21 13:34:44,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.8, 300 sec: 12996.1). Total num frames: 8830976. Throughput: 0: 6500.3, 1: 6501.8. Samples: 8809924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:34:44,287][52220] Avg episode reward: [(0, '7314.468'), (1, '1509.728')]
[2023-09-21 13:34:44,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008624_4415488.pth...
[2023-09-21 13:34:44,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008624_4415488.pth...
[2023-09-21 13:34:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008248_4222976.pth
[2023-09-21 13:34:44,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008248_4222976.pth
[2023-09-21 13:34:45,087][52979] Updated weights for policy 1, policy_version 8640 (0.0012)
[2023-09-21 13:34:45,088][52980] Updated weights for policy 0, policy_version 8640 (0.0015)
[2023-09-21 13:34:49,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 8896512. Throughput: 0: 6490.6, 1: 6488.2. Samples: 8888324. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:34:49,288][52220] Avg episode reward: [(0, '7315.971'), (1, '3849.357')]
[2023-09-21 13:34:51,377][52980] Updated weights for policy 0, policy_version 8720 (0.0012)
[2023-09-21 13:34:51,378][52979] Updated weights for policy 1, policy_version 8720 (0.0011)
[2023-09-21 13:34:54,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 8962048. Throughput: 0: 6460.1, 1: 6459.8. Samples: 8962688. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:34:54,288][52220] Avg episode reward: [(0, '7406.052'), (1, '4832.839')]
[2023-09-21 13:34:57,777][52980] Updated weights for policy 0, policy_version 8800 (0.0010)
[2023-09-21 13:34:57,777][52979] Updated weights for policy 1, policy_version 8800 (0.0012)
[2023-09-21 13:34:59,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12982.2). Total num frames: 9027584. Throughput: 0: 6457.5, 1: 6455.9. Samples: 9003042. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:34:59,288][52220] Avg episode reward: [(0, '7689.771'), (1, '6585.865')]
[2023-09-21 13:34:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008816_4513792.pth...
[2023-09-21 13:34:59,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008816_4513792.pth...
[2023-09-21 13:34:59,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008432_4317184.pth
[2023-09-21 13:34:59,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008432_4317184.pth
[2023-09-21 13:35:04,093][52979] Updated weights for policy 1, policy_version 8880 (0.0014)
[2023-09-21 13:35:04,093][52980] Updated weights for policy 0, policy_version 8880 (0.0011)
[2023-09-21 13:35:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12968.3). Total num frames: 9093120. Throughput: 0: 6405.5, 1: 6407.7. Samples: 9079926. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:04,287][52220] Avg episode reward: [(0, '6344.982'), (1, '8266.991')]
[2023-09-21 13:35:09,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12982.2). Total num frames: 9158656. Throughput: 0: 6465.3, 1: 6464.0. Samples: 9158724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:09,287][52220] Avg episode reward: [(0, '5886.401'), (1, '8260.217')]
[2023-09-21 13:35:10,541][52980] Updated weights for policy 0, policy_version 8960 (0.0014)
[2023-09-21 13:35:10,542][52979] Updated weights for policy 1, policy_version 8960 (0.0014)
[2023-09-21 13:35:14,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 9224192. Throughput: 0: 6501.5, 1: 6501.1. Samples: 9197176. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:14,287][52220] Avg episode reward: [(0, '6321.409'), (1, '7973.051')]
[2023-09-21 13:35:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009008_4612096.pth...
[2023-09-21 13:35:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009008_4612096.pth...
[2023-09-21 13:35:14,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008624_4415488.pth
[2023-09-21 13:35:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008624_4415488.pth
[2023-09-21 13:35:16,683][52884] KL-divergence is very high: 147.5208
[2023-09-21 13:35:16,702][52979] Updated weights for policy 1, policy_version 9040 (0.0013)
[2023-09-21 13:35:16,703][52980] Updated weights for policy 0, policy_version 9040 (0.0014)
[2023-09-21 13:35:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 9289728. Throughput: 0: 6503.1, 1: 6503.6. Samples: 9277164. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:19,288][52220] Avg episode reward: [(0, '6772.299'), (1, '8248.228')]
[2023-09-21 13:35:23,189][52979] Updated weights for policy 1, policy_version 9120 (0.0013)
[2023-09-21 13:35:23,189][52980] Updated weights for policy 0, policy_version 9120 (0.0015)
[2023-09-21 13:35:24,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 9347072. Throughput: 0: 6450.2, 1: 6450.3. Samples: 9353006. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:35:24,288][52220] Avg episode reward: [(0, '6876.287'), (1, '8247.524')]
[2023-09-21 13:35:29,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.6, 300 sec: 12968.4). Total num frames: 9412608. Throughput: 0: 6452.2, 1: 6453.6. Samples: 9390684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:29,287][52220] Avg episode reward: [(0, '6970.759'), (1, '8433.254')]
[2023-09-21 13:35:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009192_4706304.pth...
[2023-09-21 13:35:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009192_4706304.pth...
[2023-09-21 13:35:29,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000008816_4513792.pth
[2023-09-21 13:35:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000008816_4513792.pth
[2023-09-21 13:35:29,690][52979] Updated weights for policy 1, policy_version 9200 (0.0014)
[2023-09-21 13:35:29,691][52980] Updated weights for policy 0, policy_version 9200 (0.0013)
[2023-09-21 13:35:34,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 9478144. Throughput: 0: 6412.3, 1: 6414.5. Samples: 9465524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:34,287][52220] Avg episode reward: [(0, '7790.873'), (1, '8158.990')]
[2023-09-21 13:35:36,222][52980] Updated weights for policy 0, policy_version 9280 (0.0014)
[2023-09-21 13:35:36,222][52979] Updated weights for policy 1, policy_version 9280 (0.0014)
[2023-09-21 13:35:39,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 9535488. Throughput: 0: 6439.4, 1: 6439.4. Samples: 9542234. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:39,287][52220] Avg episode reward: [(0, '7686.583'), (1, '8435.588')]
[2023-09-21 13:35:42,494][52980] Updated weights for policy 0, policy_version 9360 (0.0010)
[2023-09-21 13:35:42,495][52979] Updated weights for policy 1, policy_version 9360 (0.0015)
[2023-09-21 13:35:44,287][52220] Fps is (10 sec: 12287.6, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 9601024. Throughput: 0: 6419.4, 1: 6420.9. Samples: 9580856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:44,288][52220] Avg episode reward: [(0, '7499.111'), (1, '8810.491')]
[2023-09-21 13:35:44,306][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009384_4804608.pth...
[2023-09-21 13:35:44,310][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009008_4612096.pth
[2023-09-21 13:35:44,312][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009384_4804608.pth...
[2023-09-21 13:35:44,315][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009008_4612096.pth
[2023-09-21 13:35:48,737][52979] Updated weights for policy 1, policy_version 9440 (0.0015)
[2023-09-21 13:35:48,737][52980] Updated weights for policy 0, policy_version 9440 (0.0013)
[2023-09-21 13:35:49,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 9666560. Throughput: 0: 6442.4, 1: 6441.5. Samples: 9659702. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:49,287][52220] Avg episode reward: [(0, '7780.731'), (1, '8995.249')]
[2023-09-21 13:35:54,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 9732096. Throughput: 0: 6434.7, 1: 6436.9. Samples: 9737944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:54,287][52220] Avg episode reward: [(0, '7785.500'), (1, '8905.386')]
[2023-09-21 13:35:55,014][52980] Updated weights for policy 0, policy_version 9520 (0.0010)
[2023-09-21 13:35:55,014][52979] Updated weights for policy 1, policy_version 9520 (0.0015)
[2023-09-21 13:35:59,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 9797632. Throughput: 0: 6421.8, 1: 6421.5. Samples: 9775124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:35:59,287][52220] Avg episode reward: [(0, '8345.138'), (1, '4258.503')]
[2023-09-21 13:35:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009568_4898816.pth...
[2023-09-21 13:35:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009568_4898816.pth...
[2023-09-21 13:35:59,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009192_4706304.pth
[2023-09-21 13:35:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009192_4706304.pth
[2023-09-21 13:36:01,481][52979] Updated weights for policy 1, policy_version 9600 (0.0011)
[2023-09-21 13:36:01,481][52980] Updated weights for policy 0, policy_version 9600 (0.0014)
[2023-09-21 13:36:04,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 9863168. Throughput: 0: 6397.5, 1: 6397.3. Samples: 9852930. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:36:04,288][52220] Avg episode reward: [(0, '6398.098'), (1, '3352.109')]
[2023-09-21 13:36:07,758][52980] Updated weights for policy 0, policy_version 9680 (0.0014)
[2023-09-21 13:36:07,758][52979] Updated weights for policy 1, policy_version 9680 (0.0013)
[2023-09-21 13:36:09,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 9928704. Throughput: 0: 6421.9, 1: 6421.4. Samples: 9930952. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:36:09,287][52220] Avg episode reward: [(0, '5843.385'), (1, '3512.802')]
[2023-09-21 13:36:13,897][52979] Updated weights for policy 1, policy_version 9760 (0.0012)
[2023-09-21 13:36:13,897][52980] Updated weights for policy 0, policy_version 9760 (0.0013)
[2023-09-21 13:36:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 9994240. Throughput: 0: 6449.6, 1: 6449.6. Samples: 9971150. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:36:14,288][52220] Avg episode reward: [(0, '7050.183'), (1, '1346.937')]
[2023-09-21 13:36:14,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009760_4997120.pth...
[2023-09-21 13:36:14,299][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009760_4997120.pth...
[2023-09-21 13:36:14,306][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009384_4804608.pth
[2023-09-21 13:36:14,307][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009384_4804608.pth
[2023-09-21 13:36:19,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 10059776. Throughput: 0: 6508.6, 1: 6507.3. Samples: 10051246. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:36:19,287][52220] Avg episode reward: [(0, '8058.993'), (1, '413.387')]
[2023-09-21 13:36:20,323][52980] Updated weights for policy 0, policy_version 9840 (0.0015)
[2023-09-21 13:36:20,324][52979] Updated weights for policy 1, policy_version 9840 (0.0014)
[2023-09-21 13:36:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 10125312. Throughput: 0: 6488.9, 1: 6489.0. Samples: 10126240. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:36:24,287][52220] Avg episode reward: [(0, '8148.723'), (1, '2379.849')]
[2023-09-21 13:36:26,694][52979] Updated weights for policy 1, policy_version 9920 (0.0014)
[2023-09-21 13:36:26,694][52980] Updated weights for policy 0, policy_version 9920 (0.0013)
[2023-09-21 13:36:29,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 10182656. Throughput: 0: 6498.2, 1: 6497.2. Samples: 10165648. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:36:29,287][52220] Avg episode reward: [(0, '8428.183'), (1, '4691.143')]
[2023-09-21 13:36:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009944_5091328.pth...
[2023-09-21 13:36:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009944_5091328.pth...
[2023-09-21 13:36:29,305][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009568_4898816.pth
[2023-09-21 13:36:29,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009568_4898816.pth
[2023-09-21 13:36:33,318][52980] Updated weights for policy 0, policy_version 10000 (0.0016)
[2023-09-21 13:36:33,319][52979] Updated weights for policy 1, policy_version 10000 (0.0014)
[2023-09-21 13:36:34,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 10248192. Throughput: 0: 6432.7, 1: 6432.5. Samples: 10238636. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:36:34,287][52220] Avg episode reward: [(0, '6512.251'), (1, '5747.198')]
[2023-09-21 13:36:39,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 10313728. Throughput: 0: 6457.1, 1: 6457.1. Samples: 10319088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:36:39,287][52220] Avg episode reward: [(0, '6034.451'), (1, '6649.274')]
[2023-09-21 13:36:39,421][52979] Updated weights for policy 1, policy_version 10080 (0.0013)
[2023-09-21 13:36:39,421][52980] Updated weights for policy 0, policy_version 10080 (0.0015)
[2023-09-21 13:36:44,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 10379264. Throughput: 0: 6465.2, 1: 6465.9. Samples: 10357024. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:36:44,287][52220] Avg episode reward: [(0, '6474.585'), (1, '7822.539')]
[2023-09-21 13:36:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010136_5189632.pth...
[2023-09-21 13:36:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010136_5189632.pth...
[2023-09-21 13:36:44,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009760_4997120.pth
[2023-09-21 13:36:44,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009760_4997120.pth
[2023-09-21 13:36:45,746][52980] Updated weights for policy 0, policy_version 10160 (0.0007)
[2023-09-21 13:36:45,746][52979] Updated weights for policy 1, policy_version 10160 (0.0011)
[2023-09-21 13:36:49,286][52220] Fps is (10 sec: 13926.5, 60 sec: 13107.3, 300 sec: 13023.9). Total num frames: 10452992. Throughput: 0: 6500.3, 1: 6500.8. Samples: 10437978. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:36:49,287][52220] Avg episode reward: [(0, '5752.454'), (1, '8363.105')]
[2023-09-21 13:36:51,686][52979] Updated weights for policy 1, policy_version 10240 (0.0014)
[2023-09-21 13:36:51,687][52980] Updated weights for policy 0, policy_version 10240 (0.0011)
[2023-09-21 13:36:54,286][52220] Fps is (10 sec: 13926.2, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10518528. Throughput: 0: 6530.3, 1: 6528.3. Samples: 10518592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:36:54,287][52220] Avg episode reward: [(0, '5499.023'), (1, '8265.828')]
[2023-09-21 13:36:57,890][52979] Updated weights for policy 1, policy_version 10320 (0.0013)
[2023-09-21 13:36:57,890][52980] Updated weights for policy 0, policy_version 10320 (0.0013)
[2023-09-21 13:36:59,287][52220] Fps is (10 sec: 13106.8, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10584064. Throughput: 0: 6531.3, 1: 6530.0. Samples: 10558912. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:36:59,288][52220] Avg episode reward: [(0, '2764.041'), (1, '8536.378')]
[2023-09-21 13:36:59,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010336_5292032.pth...
[2023-09-21 13:36:59,299][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010336_5292032.pth...
[2023-09-21 13:36:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000009944_5091328.pth
[2023-09-21 13:36:59,305][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000009944_5091328.pth
[2023-09-21 13:37:04,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12968.3). Total num frames: 10641408. Throughput: 0: 6475.6, 1: 6476.1. Samples: 10634076. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:37:04,288][52220] Avg episode reward: [(0, '2261.539'), (1, '8898.543')]
[2023-09-21 13:37:04,341][52979] Updated weights for policy 1, policy_version 10400 (0.0015)
[2023-09-21 13:37:04,341][52980] Updated weights for policy 0, policy_version 10400 (0.0015)
[2023-09-21 13:37:09,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10715136. Throughput: 0: 6563.3, 1: 6563.3. Samples: 10716940. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:37:09,287][52220] Avg episode reward: [(0, '4187.421'), (1, '9086.295')]
[2023-09-21 13:37:10,335][52980] Updated weights for policy 0, policy_version 10480 (0.0010)
[2023-09-21 13:37:10,336][52979] Updated weights for policy 1, policy_version 10480 (0.0014)
[2023-09-21 13:37:14,287][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10780672. Throughput: 0: 6558.8, 1: 6558.5. Samples: 10755928. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:37:14,288][52220] Avg episode reward: [(0, '5168.250'), (1, '8814.144')]
[2023-09-21 13:37:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010528_5390336.pth...
[2023-09-21 13:37:14,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010528_5390336.pth...
[2023-09-21 13:37:14,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010136_5189632.pth
[2023-09-21 13:37:14,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010136_5189632.pth
[2023-09-21 13:37:16,736][52979] Updated weights for policy 1, policy_version 10560 (0.0015)
[2023-09-21 13:37:16,736][52980] Updated weights for policy 0, policy_version 10560 (0.0015)
[2023-09-21 13:37:19,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 10846208. Throughput: 0: 6626.3, 1: 6626.6. Samples: 10835016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:37:19,288][52220] Avg episode reward: [(0, '6051.410'), (1, '7824.859')]
[2023-09-21 13:37:22,912][52980] Updated weights for policy 0, policy_version 10640 (0.0015)
[2023-09-21 13:37:22,912][52979] Updated weights for policy 1, policy_version 10640 (0.0014)
[2023-09-21 13:37:24,286][52220] Fps is (10 sec: 13107.7, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 10911744. Throughput: 0: 6602.1, 1: 6601.4. Samples: 10913244. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:37:24,287][52220] Avg episode reward: [(0, '6501.744'), (1, '7278.530')]
[2023-09-21 13:37:27,810][52885] KL-divergence is very high: 143.8693
[2023-09-21 13:37:29,190][52980] Updated weights for policy 0, policy_version 10720 (0.0014)
[2023-09-21 13:37:29,190][52979] Updated weights for policy 1, policy_version 10720 (0.0012)
[2023-09-21 13:37:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13243.7, 300 sec: 12996.1). Total num frames: 10977280. Throughput: 0: 6619.7, 1: 6618.3. Samples: 10952738. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:37:29,288][52220] Avg episode reward: [(0, '6683.727'), (1, '7822.196')]
[2023-09-21 13:37:29,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010720_5488640.pth...
[2023-09-21 13:37:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010720_5488640.pth...
[2023-09-21 13:37:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010336_5292032.pth
[2023-09-21 13:37:29,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010336_5292032.pth
[2023-09-21 13:37:34,287][52220] Fps is (10 sec: 12287.7, 60 sec: 13107.2, 300 sec: 12968.3). Total num frames: 11034624. Throughput: 0: 6518.6, 1: 6518.2. Samples: 11024638. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:37:34,287][52220] Avg episode reward: [(0, '6778.857'), (1, '6895.518')]
[2023-09-21 13:37:35,908][52980] Updated weights for policy 0, policy_version 10800 (0.0012)
[2023-09-21 13:37:35,909][52979] Updated weights for policy 1, policy_version 10800 (0.0016)
[2023-09-21 13:37:39,287][52220] Fps is (10 sec: 12288.0, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 11100160. Throughput: 0: 6486.9, 1: 6488.4. Samples: 11102482. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:37:39,288][52220] Avg episode reward: [(0, '7150.504'), (1, '6519.836')]
[2023-09-21 13:37:42,034][52980] Updated weights for policy 0, policy_version 10880 (0.0013)
[2023-09-21 13:37:42,035][52979] Updated weights for policy 1, policy_version 10880 (0.0014)
[2023-09-21 13:37:42,606][52885] KL-divergence is very high: 138.0579
[2023-09-21 13:37:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 11165696. Throughput: 0: 6489.3, 1: 6489.9. Samples: 11142974. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0)
[2023-09-21 13:37:44,287][52220] Avg episode reward: [(0, '6963.117'), (1, '7246.856')]
[2023-09-21 13:37:44,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010904_5582848.pth...
[2023-09-21 13:37:44,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010904_5582848.pth...
[2023-09-21 13:37:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010528_5390336.pth
[2023-09-21 13:37:44,306][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010528_5390336.pth
[2023-09-21 13:37:48,122][52980] Updated weights for policy 0, policy_version 10960 (0.0012)
[2023-09-21 13:37:48,122][52979] Updated weights for policy 1, policy_version 10960 (0.0013)
[2023-09-21 13:37:49,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 11231232. Throughput: 0: 6550.0, 1: 6551.1. Samples: 11223624. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:37:49,288][52220] Avg episode reward: [(0, '7147.782'), (1, '8438.793')]
[2023-09-21 13:37:54,252][52980] Updated weights for policy 0, policy_version 11040 (0.0012)
[2023-09-21 13:37:54,252][52979] Updated weights for policy 1, policy_version 11040 (0.0014)
[2023-09-21 13:37:54,286][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 11304960. Throughput: 0: 6531.1, 1: 6530.9. Samples: 11304730. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:37:54,287][52220] Avg episode reward: [(0, '7782.139'), (1, '8537.772')]
[2023-09-21 13:37:57,931][52885] KL-divergence is very high: 115.5991
[2023-09-21 13:37:59,286][52220] Fps is (10 sec: 13926.6, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 11370496. Throughput: 0: 6548.2, 1: 6549.4. Samples: 11345316. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:37:59,287][52220] Avg episode reward: [(0, '7417.448'), (1, '7447.728')]
[2023-09-21 13:37:59,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011104_5685248.pth...
[2023-09-21 13:37:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011104_5685248.pth...
[2023-09-21 13:37:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010720_5488640.pth
[2023-09-21 13:37:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010720_5488640.pth
[2023-09-21 13:38:00,539][52980] Updated weights for policy 0, policy_version 11120 (0.0015)
[2023-09-21 13:38:00,539][52979] Updated weights for policy 1, policy_version 11120 (0.0014)
[2023-09-21 13:38:01,151][52884] KL-divergence is very high: 230.7247
[2023-09-21 13:38:04,286][52220] Fps is (10 sec: 12697.7, 60 sec: 13175.5, 300 sec: 13037.8). Total num frames: 11431936. Throughput: 0: 6512.6, 1: 6512.1. Samples: 11421128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:38:04,288][52220] Avg episode reward: [(0, '6873.014'), (1, '6362.016')]
[2023-09-21 13:38:06,743][52979] Updated weights for policy 1, policy_version 11200 (0.0013)
[2023-09-21 13:38:06,744][52980] Updated weights for policy 0, policy_version 11200 (0.0014)
[2023-09-21 13:38:07,990][52885] KL-divergence is very high: 212.7733
[2023-09-21 13:38:08,000][52885] KL-divergence is very high: 139.2826
[2023-09-21 13:38:09,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 11493376. Throughput: 0: 6517.9, 1: 6518.6. Samples: 11499888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:38:09,287][52220] Avg episode reward: [(0, '6786.092'), (1, '6250.348')]
[2023-09-21 13:38:13,076][52885] KL-divergence is very high: 109.2257
[2023-09-21 13:38:13,086][52885] KL-divergence is very high: 150.3513
[2023-09-21 13:38:13,109][52980] Updated weights for policy 0, policy_version 11280 (0.0012)
[2023-09-21 13:38:13,110][52979] Updated weights for policy 1, policy_version 11280 (0.0016)
[2023-09-21 13:38:14,286][52220] Fps is (10 sec: 12697.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 11558912. Throughput: 0: 6512.6, 1: 6514.2. Samples: 11538944. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:38:14,287][52220] Avg episode reward: [(0, '7521.599'), (1, '5948.663')]
[2023-09-21 13:38:14,293][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011288_5779456.pth...
[2023-09-21 13:38:14,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011288_5779456.pth...
[2023-09-21 13:38:14,296][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000010904_5582848.pth
[2023-09-21 13:38:14,296][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000010904_5582848.pth
[2023-09-21 13:38:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 11624448. Throughput: 0: 6574.6, 1: 6572.5. Samples: 11616260. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:38:19,287][52220] Avg episode reward: [(0, '8163.203'), (1, '6327.272')]
[2023-09-21 13:38:19,616][52980] Updated weights for policy 0, policy_version 11360 (0.0016)
[2023-09-21 13:38:19,616][52979] Updated weights for policy 1, policy_version 11360 (0.0014)
[2023-09-21 13:38:24,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 11689984. Throughput: 0: 6565.7, 1: 6566.3. Samples: 11693422. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:38:24,287][52220] Avg episode reward: [(0, '8707.112'), (1, '7533.837')]
[2023-09-21 13:38:25,719][52980] Updated weights for policy 0, policy_version 11440 (0.0012)
[2023-09-21 13:38:25,719][52979] Updated weights for policy 1, policy_version 11440 (0.0014)
[2023-09-21 13:38:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 11755520. Throughput: 0: 6542.6, 1: 6542.6. Samples: 11731810. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:38:29,288][52220] Avg episode reward: [(0, '8334.238'), (1, '7161.629')]
[2023-09-21 13:38:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011480_5877760.pth...
[2023-09-21 13:38:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011480_5877760.pth...
[2023-09-21 13:38:29,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011104_5685248.pth
[2023-09-21 13:38:29,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011104_5685248.pth
[2023-09-21 13:38:32,316][52979] Updated weights for policy 1, policy_version 11520 (0.0014)
[2023-09-21 13:38:32,316][52980] Updated weights for policy 0, policy_version 11520 (0.0014)
[2023-09-21 13:38:34,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13107.2, 300 sec: 12968.3). Total num frames: 11821056. Throughput: 0: 6506.0, 1: 6505.5. Samples: 11809142. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:38:34,288][52220] Avg episode reward: [(0, '8241.042'), (1, '7426.216')]
[2023-09-21 13:38:38,500][52980] Updated weights for policy 0, policy_version 11600 (0.0014)
[2023-09-21 13:38:38,500][52979] Updated weights for policy 1, policy_version 11600 (0.0014)
[2023-09-21 13:38:39,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 11886592. Throughput: 0: 6467.5, 1: 6467.5. Samples: 11886808. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:38:39,288][52220] Avg episode reward: [(0, '8147.919'), (1, '6684.867')]
[2023-09-21 13:38:44,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 11952128. Throughput: 0: 6434.1, 1: 6434.2. Samples: 11924392. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:38:44,288][52220] Avg episode reward: [(0, '8057.555'), (1, '4198.564')]
[2023-09-21 13:38:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011672_5976064.pth...
[2023-09-21 13:38:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011672_5976064.pth...
[2023-09-21 13:38:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011288_5779456.pth
[2023-09-21 13:38:44,307][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011288_5779456.pth
[2023-09-21 13:38:44,662][52979] Updated weights for policy 1, policy_version 11680 (0.0015)
[2023-09-21 13:38:44,663][52980] Updated weights for policy 0, policy_version 11680 (0.0015)
[2023-09-21 13:38:49,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12017664. Throughput: 0: 6514.3, 1: 6514.4. Samples: 12007422. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:38:49,288][52220] Avg episode reward: [(0, '7413.246'), (1, '3924.282')]
[2023-09-21 13:38:50,814][52979] Updated weights for policy 1, policy_version 11760 (0.0010)
[2023-09-21 13:38:50,815][52980] Updated weights for policy 0, policy_version 11760 (0.0014)
[2023-09-21 13:38:54,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12083200. Throughput: 0: 6511.6, 1: 6510.8. Samples: 12085896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:38:54,287][52220] Avg episode reward: [(0, '7505.948'), (1, '5026.154')]
[2023-09-21 13:38:57,041][52980] Updated weights for policy 0, policy_version 11840 (0.0011)
[2023-09-21 13:38:57,042][52979] Updated weights for policy 1, policy_version 11840 (0.0014)
[2023-09-21 13:38:59,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12148736. Throughput: 0: 6525.1, 1: 6525.3. Samples: 12126212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:38:59,287][52220] Avg episode reward: [(0, '7775.590'), (1, '5951.310')]
[2023-09-21 13:38:59,292][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011864_6074368.pth...
[2023-09-21 13:38:59,292][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011864_6074368.pth...
[2023-09-21 13:38:59,296][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011480_5877760.pth
[2023-09-21 13:38:59,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011480_5877760.pth
[2023-09-21 13:39:03,339][52979] Updated weights for policy 1, policy_version 11920 (0.0014)
[2023-09-21 13:39:03,339][52980] Updated weights for policy 0, policy_version 11920 (0.0011)
[2023-09-21 13:39:04,286][52220] Fps is (10 sec: 13107.0, 60 sec: 13038.9, 300 sec: 12996.1). Total num frames: 12214272. Throughput: 0: 6540.5, 1: 6542.7. Samples: 12205004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:39:04,287][52220] Avg episode reward: [(0, '7961.348'), (1, '6505.100')]
[2023-09-21 13:39:09,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12279808. Throughput: 0: 6536.7, 1: 6537.0. Samples: 12281738. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:39:09,287][52220] Avg episode reward: [(0, '7495.365'), (1, '6685.848')]
[2023-09-21 13:39:09,636][52980] Updated weights for policy 0, policy_version 12000 (0.0016)
[2023-09-21 13:39:09,636][52979] Updated weights for policy 1, policy_version 12000 (0.0014)
[2023-09-21 13:39:14,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12345344. Throughput: 0: 6563.3, 1: 6563.6. Samples: 12322520. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:39:14,288][52220] Avg episode reward: [(0, '5639.209'), (1, '7243.011')]
[2023-09-21 13:39:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012056_6172672.pth...
[2023-09-21 13:39:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012056_6172672.pth...
[2023-09-21 13:39:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011672_5976064.pth
[2023-09-21 13:39:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011672_5976064.pth
[2023-09-21 13:39:15,687][52979] Updated weights for policy 1, policy_version 12080 (0.0011)
[2023-09-21 13:39:15,688][52980] Updated weights for policy 0, policy_version 12080 (0.0010)
[2023-09-21 13:39:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12996.1). Total num frames: 12410880. Throughput: 0: 6590.5, 1: 6589.8. Samples: 12402256. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:39:19,288][52220] Avg episode reward: [(0, '5827.376'), (1, '7798.181')]
[2023-09-21 13:39:22,172][52884] KL-divergence is very high: 129.4762
[2023-09-21 13:39:22,191][52979] Updated weights for policy 1, policy_version 12160 (0.0008)
[2023-09-21 13:39:22,192][52980] Updated weights for policy 0, policy_version 12160 (0.0015)
[2023-09-21 13:39:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 12476416. Throughput: 0: 6586.4, 1: 6587.3. Samples: 12479620. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0)
[2023-09-21 13:39:24,287][52220] Avg episode reward: [(0, '5543.639'), (1, '8535.429')]
[2023-09-21 13:39:28,185][52980] Updated weights for policy 0, policy_version 12240 (0.0014)
[2023-09-21 13:39:28,185][52979] Updated weights for policy 1, policy_version 12240 (0.0015)
[2023-09-21 13:39:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 12541952. Throughput: 0: 6631.4, 1: 6631.0. Samples: 12521200. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:39:29,288][52220] Avg episode reward: [(0, '5169.488'), (1, '8902.665')]
[2023-09-21 13:39:29,298][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012248_6270976.pth...
[2023-09-21 13:39:29,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012248_6270976.pth...
[2023-09-21 13:39:29,310][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000011864_6074368.pth
[2023-09-21 13:39:29,314][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000011864_6074368.pth
[2023-09-21 13:39:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.3, 300 sec: 13023.9). Total num frames: 12607488. Throughput: 0: 6538.6, 1: 6538.5. Samples: 12595888. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:39:34,287][52220] Avg episode reward: [(0, '5544.157'), (1, '8813.978')]
[2023-09-21 13:39:34,748][52979] Updated weights for policy 1, policy_version 12320 (0.0015)
[2023-09-21 13:39:34,748][52980] Updated weights for policy 0, policy_version 12320 (0.0011)
[2023-09-21 13:39:35,923][52885] KL-divergence is very high: 109.0132
[2023-09-21 13:39:39,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 12673024. Throughput: 0: 6524.4, 1: 6523.6. Samples: 12673058. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:39:39,287][52220] Avg episode reward: [(0, '5093.072'), (1, '8631.845')]
[2023-09-21 13:39:41,072][52979] Updated weights for policy 1, policy_version 12400 (0.0014)
[2023-09-21 13:39:41,072][52980] Updated weights for policy 0, policy_version 12400 (0.0011)
[2023-09-21 13:39:44,287][52220] Fps is (10 sec: 12287.6, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12730368. Throughput: 0: 6501.5, 1: 6501.3. Samples: 12711342. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:39:44,288][52220] Avg episode reward: [(0, '3407.951'), (1, '8258.386')]
[2023-09-21 13:39:44,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012432_6365184.pth...
[2023-09-21 13:39:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012432_6365184.pth...
[2023-09-21 13:39:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012056_6172672.pth
[2023-09-21 13:39:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012056_6172672.pth
[2023-09-21 13:39:47,504][52980] Updated weights for policy 0, policy_version 12480 (0.0014)
[2023-09-21 13:39:47,504][52979] Updated weights for policy 1, policy_version 12480 (0.0012)
[2023-09-21 13:39:49,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12795904. Throughput: 0: 6491.6, 1: 6491.5. Samples: 12789246. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:39:49,287][52220] Avg episode reward: [(0, '2658.060'), (1, '2674.528')]
[2023-09-21 13:39:50,492][52885] KL-divergence is very high: 115.5594
[2023-09-21 13:39:53,672][52980] Updated weights for policy 0, policy_version 12560 (0.0013)
[2023-09-21 13:39:53,674][52979] Updated weights for policy 1, policy_version 12560 (0.0013)
[2023-09-21 13:39:54,286][52220] Fps is (10 sec: 13517.1, 60 sec: 13038.9, 300 sec: 13010.0). Total num frames: 12865536. Throughput: 0: 6524.7, 1: 6523.8. Samples: 12868920. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:39:54,287][52220] Avg episode reward: [(0, '3869.231'), (1, '1042.752')]
[2023-09-21 13:39:59,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 12996.1). Total num frames: 12926976. Throughput: 0: 6491.7, 1: 6491.7. Samples: 12906770. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:39:59,287][52220] Avg episode reward: [(0, '4981.854'), (1, '1124.933')]
[2023-09-21 13:39:59,340][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012632_6467584.pth...
[2023-09-21 13:39:59,342][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012632_6467584.pth...
[2023-09-21 13:39:59,343][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012248_6270976.pth
[2023-09-21 13:39:59,345][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012248_6270976.pth
[2023-09-21 13:39:59,892][52980] Updated weights for policy 0, policy_version 12640 (0.0012)
[2023-09-21 13:39:59,892][52979] Updated weights for policy 1, policy_version 12640 (0.0015)
[2023-09-21 13:40:02,465][52885] KL-divergence is very high: 110.5634
[2023-09-21 13:40:04,286][52220] Fps is (10 sec: 12697.5, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 12992512. Throughput: 0: 6473.3, 1: 6473.4. Samples: 12984858. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:04,287][52220] Avg episode reward: [(0, '4702.406'), (1, '1019.731')]
[2023-09-21 13:40:06,275][52980] Updated weights for policy 0, policy_version 12720 (0.0014)
[2023-09-21 13:40:06,275][52979] Updated weights for policy 1, policy_version 12720 (0.0014)
[2023-09-21 13:40:09,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 13058048. Throughput: 0: 6496.7, 1: 6495.8. Samples: 13064286. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:09,288][52220] Avg episode reward: [(0, '5727.771'), (1, '851.461')]
[2023-09-21 13:40:12,550][52980] Updated weights for policy 0, policy_version 12800 (0.0015)
[2023-09-21 13:40:12,550][52979] Updated weights for policy 1, policy_version 12800 (0.0015)
[2023-09-21 13:40:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12996.1). Total num frames: 13123584. Throughput: 0: 6459.3, 1: 6459.3. Samples: 13102536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:14,287][52220] Avg episode reward: [(0, '6286.904'), (1, '2510.005')]
[2023-09-21 13:40:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012816_6561792.pth...
[2023-09-21 13:40:14,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012816_6561792.pth...
[2023-09-21 13:40:14,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012432_6365184.pth
[2023-09-21 13:40:14,306][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012432_6365184.pth
[2023-09-21 13:40:18,934][52980] Updated weights for policy 0, policy_version 12880 (0.0014)
[2023-09-21 13:40:18,934][52979] Updated weights for policy 1, policy_version 12880 (0.0010)
[2023-09-21 13:40:19,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 13189120. Throughput: 0: 6495.7, 1: 6496.0. Samples: 13180516. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:40:19,287][52220] Avg episode reward: [(0, '6403.772'), (1, '1094.830')]
[2023-09-21 13:40:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 13023.9). Total num frames: 13254656. Throughput: 0: 6480.1, 1: 6480.6. Samples: 13256292. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:40:24,287][52220] Avg episode reward: [(0, '6589.868'), (1, '3891.651')]
[2023-09-21 13:40:25,346][52979] Updated weights for policy 1, policy_version 12960 (0.0011)
[2023-09-21 13:40:25,346][52980] Updated weights for policy 0, policy_version 12960 (0.0014)
[2023-09-21 13:40:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 13320192. Throughput: 0: 6493.9, 1: 6491.5. Samples: 13295684. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:29,288][52220] Avg episode reward: [(0, '5357.029'), (1, '4019.919')]
[2023-09-21 13:40:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013008_6660096.pth...
[2023-09-21 13:40:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013008_6660096.pth...
[2023-09-21 13:40:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012632_6467584.pth
[2023-09-21 13:40:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012632_6467584.pth
[2023-09-21 13:40:31,604][52980] Updated weights for policy 0, policy_version 13040 (0.0015)
[2023-09-21 13:40:31,605][52979] Updated weights for policy 1, policy_version 13040 (0.0014)
[2023-09-21 13:40:34,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 13385728. Throughput: 0: 6513.6, 1: 6512.9. Samples: 13375438. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:34,288][52220] Avg episode reward: [(0, '4798.645'), (1, '1322.321')]
[2023-09-21 13:40:37,698][52980] Updated weights for policy 0, policy_version 13120 (0.0012)
[2023-09-21 13:40:37,698][52979] Updated weights for policy 1, policy_version 13120 (0.0013)
[2023-09-21 13:40:39,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 13051.7). Total num frames: 13451264. Throughput: 0: 6515.2, 1: 6516.0. Samples: 13455328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:39,287][52220] Avg episode reward: [(0, '4894.226'), (1, '1733.578')]
[2023-09-21 13:40:43,916][52980] Updated weights for policy 0, policy_version 13200 (0.0012)
[2023-09-21 13:40:43,917][52979] Updated weights for policy 1, policy_version 13200 (0.0015)
[2023-09-21 13:40:44,286][52220] Fps is (10 sec: 13107.5, 60 sec: 13107.3, 300 sec: 13051.7). Total num frames: 13516800. Throughput: 0: 6536.4, 1: 6536.6. Samples: 13495052. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:44,287][52220] Avg episode reward: [(0, '5821.654'), (1, '2273.474')]
[2023-09-21 13:40:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013200_6758400.pth...
[2023-09-21 13:40:44,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013200_6758400.pth...
[2023-09-21 13:40:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000012816_6561792.pth
[2023-09-21 13:40:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000012816_6561792.pth
[2023-09-21 13:40:49,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13582336. Throughput: 0: 6536.3, 1: 6537.0. Samples: 13573156. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:49,288][52220] Avg episode reward: [(0, '6473.417'), (1, '2034.321')]
[2023-09-21 13:40:50,262][52980] Updated weights for policy 0, policy_version 13280 (0.0011)
[2023-09-21 13:40:50,263][52979] Updated weights for policy 1, policy_version 13280 (0.0014)
[2023-09-21 13:40:54,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13038.9, 300 sec: 13051.7). Total num frames: 13647872. Throughput: 0: 6534.6, 1: 6535.8. Samples: 13652452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:54,287][52220] Avg episode reward: [(0, '6846.870'), (1, '4313.712')]
[2023-09-21 13:40:56,350][52979] Updated weights for policy 1, policy_version 13360 (0.0015)
[2023-09-21 13:40:56,351][52980] Updated weights for policy 0, policy_version 13360 (0.0015)
[2023-09-21 13:40:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13713408. Throughput: 0: 6553.2, 1: 6553.4. Samples: 13692332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:40:59,287][52220] Avg episode reward: [(0, '6939.912'), (1, '6488.300')]
[2023-09-21 13:40:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013392_6856704.pth...
[2023-09-21 13:40:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013392_6856704.pth...
[2023-09-21 13:40:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013008_6660096.pth
[2023-09-21 13:40:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013008_6660096.pth
[2023-09-21 13:41:02,541][52979] Updated weights for policy 1, policy_version 13440 (0.0012)
[2023-09-21 13:41:02,541][52980] Updated weights for policy 0, policy_version 13440 (0.0014)
[2023-09-21 13:41:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13778944. Throughput: 0: 6570.4, 1: 6571.0. Samples: 13771876. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:41:04,287][52220] Avg episode reward: [(0, '6845.519'), (1, '6178.206')]
[2023-09-21 13:41:08,761][52979] Updated weights for policy 1, policy_version 13520 (0.0012)
[2023-09-21 13:41:08,761][52980] Updated weights for policy 0, policy_version 13520 (0.0010)
[2023-09-21 13:41:09,287][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13844480. Throughput: 0: 6601.5, 1: 6602.3. Samples: 13850466. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:41:09,288][52220] Avg episode reward: [(0, '6752.829'), (1, '1113.059')]
[2023-09-21 13:41:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13910016. Throughput: 0: 6612.3, 1: 6614.8. Samples: 13890904. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:41:14,288][52220] Avg episode reward: [(0, '6938.709'), (1, '775.866')]
[2023-09-21 13:41:14,314][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013592_6959104.pth...
[2023-09-21 13:41:14,318][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013592_6959104.pth...
[2023-09-21 13:41:14,321][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013200_6758400.pth
[2023-09-21 13:41:14,321][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013200_6758400.pth
[2023-09-21 13:41:14,951][52979] Updated weights for policy 1, policy_version 13600 (0.0014)
[2023-09-21 13:41:14,951][52980] Updated weights for policy 0, policy_version 13600 (0.0014)
[2023-09-21 13:41:19,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 13975552. Throughput: 0: 6578.3, 1: 6577.0. Samples: 13967426. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:41:19,287][52220] Avg episode reward: [(0, '7684.580'), (1, '875.549')]
[2023-09-21 13:41:21,564][52979] Updated weights for policy 1, policy_version 13680 (0.0015)
[2023-09-21 13:41:21,565][52980] Updated weights for policy 0, policy_version 13680 (0.0014)
[2023-09-21 13:41:24,286][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14041088. Throughput: 0: 6522.3, 1: 6522.0. Samples: 14042316. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:41:24,287][52220] Avg episode reward: [(0, '7871.616'), (1, '901.659')]
[2023-09-21 13:41:27,942][52980] Updated weights for policy 0, policy_version 13760 (0.0016)
[2023-09-21 13:41:27,942][52979] Updated weights for policy 1, policy_version 13760 (0.0013)
[2023-09-21 13:41:29,287][52220] Fps is (10 sec: 13106.8, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14106624. Throughput: 0: 6510.1, 1: 6509.6. Samples: 14080944. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:41:29,288][52220] Avg episode reward: [(0, '8058.714'), (1, '986.991')]
[2023-09-21 13:41:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013776_7053312.pth...
[2023-09-21 13:41:29,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013776_7053312.pth...
[2023-09-21 13:41:29,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013392_6856704.pth
[2023-09-21 13:41:29,307][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013392_6856704.pth
[2023-09-21 13:41:34,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 14163968. Throughput: 0: 6475.9, 1: 6475.1. Samples: 14155952. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:41:34,287][52220] Avg episode reward: [(0, '7422.855'), (1, '973.493')]
[2023-09-21 13:41:34,417][52980] Updated weights for policy 0, policy_version 13840 (0.0011)
[2023-09-21 13:41:34,418][52979] Updated weights for policy 1, policy_version 13840 (0.0011)
[2023-09-21 13:41:39,286][52220] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14237696. Throughput: 0: 6505.0, 1: 6503.5. Samples: 14237834. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:41:39,288][52220] Avg episode reward: [(0, '6406.380'), (1, '944.796')]
[2023-09-21 13:41:40,439][52980] Updated weights for policy 0, policy_version 13920 (0.0013)
[2023-09-21 13:41:40,440][52979] Updated weights for policy 1, policy_version 13920 (0.0015)
[2023-09-21 13:41:44,287][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.1, 300 sec: 13051.6). Total num frames: 14303232. Throughput: 0: 6514.6, 1: 6512.8. Samples: 14278562. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:41:44,287][52220] Avg episode reward: [(0, '5668.432'), (1, '1637.056')]
[2023-09-21 13:41:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013968_7151616.pth...
[2023-09-21 13:41:44,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013968_7151616.pth...
[2023-09-21 13:41:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013592_6959104.pth
[2023-09-21 13:41:44,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013592_6959104.pth
[2023-09-21 13:41:46,779][52980] Updated weights for policy 0, policy_version 14000 (0.0010)
[2023-09-21 13:41:46,779][52979] Updated weights for policy 1, policy_version 14000 (0.0014)
[2023-09-21 13:41:49,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14368768. Throughput: 0: 6501.0, 1: 6500.4. Samples: 14356936. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:41:49,287][52220] Avg episode reward: [(0, '5751.232'), (1, '1077.826')]
[2023-09-21 13:41:52,943][52980] Updated weights for policy 0, policy_version 14080 (0.0014)
[2023-09-21 13:41:52,943][52979] Updated weights for policy 1, policy_version 14080 (0.0014)
[2023-09-21 13:41:54,286][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14434304. Throughput: 0: 6487.6, 1: 6486.1. Samples: 14434282. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:41:54,287][52220] Avg episode reward: [(0, '5288.614'), (1, '1222.203')]
[2023-09-21 13:41:59,074][52979] Updated weights for policy 1, policy_version 14160 (0.0012)
[2023-09-21 13:41:59,075][52980] Updated weights for policy 0, policy_version 14160 (0.0011)
[2023-09-21 13:41:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 14499840. Throughput: 0: 6491.3, 1: 6489.6. Samples: 14475046. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:41:59,288][52220] Avg episode reward: [(0, '5381.397'), (1, '4453.209')]
[2023-09-21 13:41:59,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014160_7249920.pth...
[2023-09-21 13:41:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014160_7249920.pth...
[2023-09-21 13:41:59,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013776_7053312.pth
[2023-09-21 13:41:59,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013776_7053312.pth
[2023-09-21 13:42:04,287][52220] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14565376. Throughput: 0: 6495.6, 1: 6497.6. Samples: 14552122. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:04,287][52220] Avg episode reward: [(0, '5913.161'), (1, '6389.361')]
[2023-09-21 13:42:05,472][52979] Updated weights for policy 1, policy_version 14240 (0.0009)
[2023-09-21 13:42:05,473][52980] Updated weights for policy 0, policy_version 14240 (0.0012)
[2023-09-21 13:42:09,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14622720. Throughput: 0: 6529.7, 1: 6529.5. Samples: 14629980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:09,288][52220] Avg episode reward: [(0, '4700.708'), (1, '1918.955')]
[2023-09-21 13:42:11,933][52979] Updated weights for policy 1, policy_version 14320 (0.0012)
[2023-09-21 13:42:11,934][52980] Updated weights for policy 0, policy_version 14320 (0.0013)
[2023-09-21 13:42:14,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14688256. Throughput: 0: 6520.9, 1: 6521.3. Samples: 14667840. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:14,287][52220] Avg episode reward: [(0, '4606.087'), (1, '4103.747')]
[2023-09-21 13:42:14,337][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014352_7348224.pth...
[2023-09-21 13:42:14,339][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014352_7348224.pth...
[2023-09-21 13:42:14,340][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000013968_7151616.pth
[2023-09-21 13:42:14,342][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000013968_7151616.pth
[2023-09-21 13:42:18,049][52980] Updated weights for policy 0, policy_version 14400 (0.0013)
[2023-09-21 13:42:18,050][52979] Updated weights for policy 1, policy_version 14400 (0.0009)
[2023-09-21 13:42:19,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14753792. Throughput: 0: 6568.6, 1: 6569.8. Samples: 14747182. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:19,287][52220] Avg episode reward: [(0, '5632.569'), (1, '1881.283')]
[2023-09-21 13:42:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 14819328. Throughput: 0: 6508.1, 1: 6509.0. Samples: 14823600. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:42:24,287][52220] Avg episode reward: [(0, '6099.656'), (1, '2057.043')]
[2023-09-21 13:42:24,532][52980] Updated weights for policy 0, policy_version 14480 (0.0016)
[2023-09-21 13:42:24,532][52979] Updated weights for policy 1, policy_version 14480 (0.0011)
[2023-09-21 13:42:29,287][52220] Fps is (10 sec: 13106.8, 60 sec: 12970.7, 300 sec: 13051.7). Total num frames: 14884864. Throughput: 0: 6464.5, 1: 6465.3. Samples: 14860408. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:42:29,288][52220] Avg episode reward: [(0, '6192.287'), (1, '1154.242')]
[2023-09-21 13:42:29,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014536_7442432.pth...
[2023-09-21 13:42:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014536_7442432.pth...
[2023-09-21 13:42:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014160_7249920.pth
[2023-09-21 13:42:29,305][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014160_7249920.pth
[2023-09-21 13:42:30,892][52979] Updated weights for policy 1, policy_version 14560 (0.0014)
[2023-09-21 13:42:30,892][52980] Updated weights for policy 0, policy_version 14560 (0.0014)
[2023-09-21 13:42:34,286][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13051.7). Total num frames: 14950400. Throughput: 0: 6484.9, 1: 6483.7. Samples: 14940524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:34,287][52220] Avg episode reward: [(0, '6564.418'), (1, '776.134')]
[2023-09-21 13:42:37,348][52979] Updated weights for policy 1, policy_version 14640 (0.0011)
[2023-09-21 13:42:37,350][52980] Updated weights for policy 0, policy_version 14640 (0.0012)
[2023-09-21 13:42:39,286][52220] Fps is (10 sec: 12288.4, 60 sec: 12834.2, 300 sec: 13023.9). Total num frames: 15007744. Throughput: 0: 6449.1, 1: 6450.1. Samples: 15014748. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:39,287][52220] Avg episode reward: [(0, '6752.231'), (1, '798.317')]
[2023-09-21 13:42:43,911][52979] Updated weights for policy 1, policy_version 14720 (0.0011)
[2023-09-21 13:42:43,911][52980] Updated weights for policy 0, policy_version 14720 (0.0013)
[2023-09-21 13:42:44,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 15073280. Throughput: 0: 6414.2, 1: 6414.8. Samples: 15052352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:44,288][52220] Avg episode reward: [(0, '7218.806'), (1, '952.032')]
[2023-09-21 13:42:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014720_7536640.pth...
[2023-09-21 13:42:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014720_7536640.pth...
[2023-09-21 13:42:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014352_7348224.pth
[2023-09-21 13:42:44,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014352_7348224.pth
[2023-09-21 13:42:49,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15138816. Throughput: 0: 6432.9, 1: 6433.1. Samples: 15131090. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:49,287][52220] Avg episode reward: [(0, '7591.005'), (1, '1153.393')]
[2023-09-21 13:42:50,055][52979] Updated weights for policy 1, policy_version 14800 (0.0013)
[2023-09-21 13:42:50,055][52980] Updated weights for policy 0, policy_version 14800 (0.0014)
[2023-09-21 13:42:54,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15204352. Throughput: 0: 6436.1, 1: 6436.5. Samples: 15209242. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:54,288][52220] Avg episode reward: [(0, '7315.933'), (1, '505.392')]
[2023-09-21 13:42:56,471][52980] Updated weights for policy 0, policy_version 14880 (0.0016)
[2023-09-21 13:42:56,471][52979] Updated weights for policy 1, policy_version 14880 (0.0015)
[2023-09-21 13:42:59,286][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 13010.0). Total num frames: 15269888. Throughput: 0: 6419.2, 1: 6418.8. Samples: 15245548. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:42:59,287][52220] Avg episode reward: [(0, '7593.644'), (1, '585.633')]
[2023-09-21 13:42:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014912_7634944.pth...
[2023-09-21 13:42:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014912_7634944.pth...
[2023-09-21 13:42:59,308][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014536_7442432.pth
[2023-09-21 13:42:59,308][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014536_7442432.pth
[2023-09-21 13:43:02,766][52980] Updated weights for policy 0, policy_version 14960 (0.0015)
[2023-09-21 13:43:02,766][52979] Updated weights for policy 1, policy_version 14960 (0.0015)
[2023-09-21 13:43:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 15335424. Throughput: 0: 6411.9, 1: 6411.4. Samples: 15324232. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:43:04,288][52220] Avg episode reward: [(0, '7965.551'), (1, '611.235')]
[2023-09-21 13:43:09,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15392768. Throughput: 0: 6396.2, 1: 6396.4. Samples: 15399270. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:43:09,287][52220] Avg episode reward: [(0, '7776.205'), (1, '863.828')]
[2023-09-21 13:43:09,400][52979] Updated weights for policy 1, policy_version 15040 (0.0013)
[2023-09-21 13:43:09,401][52980] Updated weights for policy 0, policy_version 15040 (0.0016)
[2023-09-21 13:43:14,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.6, 300 sec: 13023.9). Total num frames: 15466496. Throughput: 0: 6432.4, 1: 6433.6. Samples: 15439374. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:43:14,287][52220] Avg episode reward: [(0, '7590.204'), (1, '841.888')]
[2023-09-21 13:43:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015104_7733248.pth...
[2023-09-21 13:43:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015104_7733248.pth...
[2023-09-21 13:43:14,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014720_7536640.pth
[2023-09-21 13:43:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014720_7536640.pth
[2023-09-21 13:43:15,499][52884] KL-divergence is very high: 348.9812
[2023-09-21 13:43:15,531][52980] Updated weights for policy 0, policy_version 15120 (0.0013)
[2023-09-21 13:43:15,531][52979] Updated weights for policy 1, policy_version 15120 (0.0014)
[2023-09-21 13:43:19,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15523840. Throughput: 0: 6417.9, 1: 6419.4. Samples: 15518202. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:43:19,288][52220] Avg episode reward: [(0, '7496.744'), (1, '2087.163')]
[2023-09-21 13:43:21,834][52980] Updated weights for policy 0, policy_version 15200 (0.0009)
[2023-09-21 13:43:21,835][52979] Updated weights for policy 1, policy_version 15200 (0.0014)
[2023-09-21 13:43:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 13023.9). Total num frames: 15597568. Throughput: 0: 6471.3, 1: 6472.0. Samples: 15597196. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:43:24,287][52220] Avg episode reward: [(0, '7870.136'), (1, '3205.815')]
[2023-09-21 13:43:28,151][52980] Updated weights for policy 0, policy_version 15280 (0.0016)
[2023-09-21 13:43:28,151][52979] Updated weights for policy 1, policy_version 15280 (0.0016)
[2023-09-21 13:43:29,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.2, 300 sec: 12996.1). Total num frames: 15654912. Throughput: 0: 6473.3, 1: 6473.8. Samples: 15634972. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:43:29,287][52220] Avg episode reward: [(0, '7966.122'), (1, '2217.232')]
[2023-09-21 13:43:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015288_7827456.pth...
[2023-09-21 13:43:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015288_7827456.pth...
[2023-09-21 13:43:29,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000014912_7634944.pth
[2023-09-21 13:43:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000014912_7634944.pth
[2023-09-21 13:43:34,286][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 15720448. Throughput: 0: 6427.9, 1: 6427.4. Samples: 15709580. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:43:34,287][52220] Avg episode reward: [(0, '7873.016'), (1, '1097.525')]
[2023-09-21 13:43:34,707][52979] Updated weights for policy 1, policy_version 15360 (0.0013)
[2023-09-21 13:43:34,708][52980] Updated weights for policy 0, policy_version 15360 (0.0015)
[2023-09-21 13:43:39,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12834.2, 300 sec: 12968.4). Total num frames: 15777792. Throughput: 0: 6392.9, 1: 6392.1. Samples: 15784562. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:43:39,287][52220] Avg episode reward: [(0, '7228.105'), (1, '3462.300')]
[2023-09-21 13:43:41,356][52980] Updated weights for policy 0, policy_version 15440 (0.0015)
[2023-09-21 13:43:41,356][52979] Updated weights for policy 1, policy_version 15440 (0.0010)
[2023-09-21 13:43:44,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 15843328. Throughput: 0: 6399.0, 1: 6398.8. Samples: 15821450. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:43:44,288][52220] Avg episode reward: [(0, '6114.203'), (1, '5123.172')]
[2023-09-21 13:43:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015472_7921664.pth...
[2023-09-21 13:43:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015472_7921664.pth...
[2023-09-21 13:43:44,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015104_7733248.pth
[2023-09-21 13:43:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015104_7733248.pth
[2023-09-21 13:43:47,434][52979] Updated weights for policy 1, policy_version 15520 (0.0013)
[2023-09-21 13:43:47,435][52980] Updated weights for policy 0, policy_version 15520 (0.0014)
[2023-09-21 13:43:49,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 15908864. Throughput: 0: 6427.6, 1: 6427.6. Samples: 15902720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:43:49,288][52220] Avg episode reward: [(0, '6109.852'), (1, '5100.013')]
[2023-09-21 13:43:53,865][52979] Updated weights for policy 1, policy_version 15600 (0.0015)
[2023-09-21 13:43:53,865][52980] Updated weights for policy 0, policy_version 15600 (0.0016)
[2023-09-21 13:43:54,287][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 15974400. Throughput: 0: 6445.1, 1: 6445.4. Samples: 15979340. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:43:54,288][52220] Avg episode reward: [(0, '6944.572'), (1, '3918.626')]
[2023-09-21 13:43:59,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 16039936. Throughput: 0: 6401.8, 1: 6399.4. Samples: 16015424. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:43:59,287][52220] Avg episode reward: [(0, '6940.350'), (1, '4835.653')]
[2023-09-21 13:43:59,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015664_8019968.pth...
[2023-09-21 13:43:59,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015664_8019968.pth...
[2023-09-21 13:43:59,299][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015288_7827456.pth
[2023-09-21 13:43:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015288_7827456.pth
[2023-09-21 13:44:00,598][52979] Updated weights for policy 1, policy_version 15680 (0.0016)
[2023-09-21 13:44:00,598][52980] Updated weights for policy 0, policy_version 15680 (0.0017)
[2023-09-21 13:44:04,286][52220] Fps is (10 sec: 12288.2, 60 sec: 12697.6, 300 sec: 12940.6). Total num frames: 16097280. Throughput: 0: 6339.2, 1: 6338.0. Samples: 16088674. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:44:04,287][52220] Avg episode reward: [(0, '7590.234'), (1, '5486.204')]
[2023-09-21 13:44:07,116][52980] Updated weights for policy 0, policy_version 15760 (0.0013)
[2023-09-21 13:44:07,117][52979] Updated weights for policy 1, policy_version 15760 (0.0013)
[2023-09-21 13:44:09,287][52220] Fps is (10 sec: 12287.6, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 16162816. Throughput: 0: 6310.4, 1: 6310.2. Samples: 16165130. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:44:09,288][52220] Avg episode reward: [(0, '7685.314'), (1, '5132.896')]
[2023-09-21 13:44:13,630][52979] Updated weights for policy 1, policy_version 15840 (0.0016)
[2023-09-21 13:44:13,630][52980] Updated weights for policy 0, policy_version 15840 (0.0012)
[2023-09-21 13:44:14,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12697.6, 300 sec: 12940.6). Total num frames: 16228352. Throughput: 0: 6316.1, 1: 6315.9. Samples: 16203410. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:14,287][52220] Avg episode reward: [(0, '7316.430'), (1, '5339.681')]
[2023-09-21 13:44:14,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015848_8114176.pth...
[2023-09-21 13:44:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015848_8114176.pth...
[2023-09-21 13:44:14,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015472_7921664.pth
[2023-09-21 13:44:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015472_7921664.pth
[2023-09-21 13:44:19,287][52220] Fps is (10 sec: 12288.2, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16285696. Throughput: 0: 6334.7, 1: 6335.3. Samples: 16279732. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:19,288][52220] Avg episode reward: [(0, '6666.683'), (1, '5483.146')]
[2023-09-21 13:44:19,984][52980] Updated weights for policy 0, policy_version 15920 (0.0015)
[2023-09-21 13:44:19,984][52979] Updated weights for policy 1, policy_version 15920 (0.0017)
[2023-09-21 13:44:24,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12561.0, 300 sec: 12912.8). Total num frames: 16351232. Throughput: 0: 6353.3, 1: 6354.3. Samples: 16356406. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:24,288][52220] Avg episode reward: [(0, '6300.850'), (1, '5569.612')]
[2023-09-21 13:44:26,416][52979] Updated weights for policy 1, policy_version 16000 (0.0014)
[2023-09-21 13:44:26,417][52980] Updated weights for policy 0, policy_version 16000 (0.0012)
[2023-09-21 13:44:29,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16416768. Throughput: 0: 6350.6, 1: 6350.9. Samples: 16393020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:29,288][52220] Avg episode reward: [(0, '6671.813'), (1, '4860.758')]
[2023-09-21 13:44:29,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016032_8208384.pth...
[2023-09-21 13:44:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016032_8208384.pth...
[2023-09-21 13:44:29,305][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015664_8019968.pth
[2023-09-21 13:44:29,306][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015664_8019968.pth
[2023-09-21 13:44:32,893][52980] Updated weights for policy 0, policy_version 16080 (0.0015)
[2023-09-21 13:44:32,893][52979] Updated weights for policy 1, policy_version 16080 (0.0016)
[2023-09-21 13:44:34,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16482304. Throughput: 0: 6305.5, 1: 6305.4. Samples: 16470210. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:34,287][52220] Avg episode reward: [(0, '7041.262'), (1, '4043.707')]
[2023-09-21 13:44:39,117][52979] Updated weights for policy 1, policy_version 16160 (0.0015)
[2023-09-21 13:44:39,117][52980] Updated weights for policy 0, policy_version 16160 (0.0014)
[2023-09-21 13:44:39,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 16547840. Throughput: 0: 6323.3, 1: 6323.0. Samples: 16548424. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:44:39,287][52220] Avg episode reward: [(0, '7127.936'), (1, '4871.255')]
[2023-09-21 13:44:44,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12834.1, 300 sec: 12940.6). Total num frames: 16613376. Throughput: 0: 6362.1, 1: 6364.0. Samples: 16588100. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:44:44,288][52220] Avg episode reward: [(0, '7220.699'), (1, '5130.083')]
[2023-09-21 13:44:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016224_8306688.pth...
[2023-09-21 13:44:44,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016224_8306688.pth...
[2023-09-21 13:44:44,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000015848_8114176.pth
[2023-09-21 13:44:44,308][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000015848_8114176.pth
[2023-09-21 13:44:45,530][52979] Updated weights for policy 1, policy_version 16240 (0.0013)
[2023-09-21 13:44:45,530][52980] Updated weights for policy 0, policy_version 16240 (0.0012)
[2023-09-21 13:44:49,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12898.9). Total num frames: 16670720. Throughput: 0: 6389.4, 1: 6390.5. Samples: 16663772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:49,287][52220] Avg episode reward: [(0, '6941.879'), (1, '5579.268')]
[2023-09-21 13:44:51,904][52979] Updated weights for policy 1, policy_version 16320 (0.0009)
[2023-09-21 13:44:51,905][52980] Updated weights for policy 0, policy_version 16320 (0.0014)
[2023-09-21 13:44:54,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12912.8). Total num frames: 16736256. Throughput: 0: 6430.6, 1: 6430.0. Samples: 16743856. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:54,288][52220] Avg episode reward: [(0, '6757.726'), (1, '6578.346')]
[2023-09-21 13:44:57,876][52979] Updated weights for policy 1, policy_version 16400 (0.0009)
[2023-09-21 13:44:57,876][52980] Updated weights for policy 0, policy_version 16400 (0.0012)
[2023-09-21 13:44:59,287][52220] Fps is (10 sec: 13925.8, 60 sec: 12834.0, 300 sec: 12940.6). Total num frames: 16809984. Throughput: 0: 6467.2, 1: 6465.7. Samples: 16785394. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:44:59,288][52220] Avg episode reward: [(0, '7129.192'), (1, '6559.431')]
[2023-09-21 13:44:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016416_8404992.pth...
[2023-09-21 13:44:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016416_8404992.pth...
[2023-09-21 13:44:59,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016032_8208384.pth
[2023-09-21 13:44:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016032_8208384.pth
[2023-09-21 13:45:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12912.8). Total num frames: 16867328. Throughput: 0: 6440.1, 1: 6438.8. Samples: 16859282. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:45:04,288][52220] Avg episode reward: [(0, '7497.588'), (1, '6464.558')]
[2023-09-21 13:45:04,465][52980] Updated weights for policy 0, policy_version 16480 (0.0012)
[2023-09-21 13:45:04,466][52979] Updated weights for policy 1, policy_version 16480 (0.0013)
[2023-09-21 13:45:09,286][52220] Fps is (10 sec: 12288.6, 60 sec: 12834.2, 300 sec: 12912.8). Total num frames: 16932864. Throughput: 0: 6431.1, 1: 6430.9. Samples: 16935194. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:45:09,287][52220] Avg episode reward: [(0, '7218.519'), (1, '6990.016')]
[2023-09-21 13:45:10,959][52979] Updated weights for policy 1, policy_version 16560 (0.0016)
[2023-09-21 13:45:10,960][52980] Updated weights for policy 0, policy_version 16560 (0.0013)
[2023-09-21 13:45:14,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12885.0). Total num frames: 16990208. Throughput: 0: 6453.1, 1: 6451.8. Samples: 16973736. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:45:14,287][52220] Avg episode reward: [(0, '7314.189'), (1, '6807.945')]
[2023-09-21 13:45:14,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016600_8499200.pth...
[2023-09-21 13:45:14,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016224_8306688.pth
[2023-09-21 13:45:14,301][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016600_8499200.pth...
[2023-09-21 13:45:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016224_8306688.pth
[2023-09-21 13:45:17,374][52980] Updated weights for policy 0, policy_version 16640 (0.0017)
[2023-09-21 13:45:17,375][52979] Updated weights for policy 1, policy_version 16640 (0.0013)
[2023-09-21 13:45:19,287][52220] Fps is (10 sec: 13106.7, 60 sec: 12970.6, 300 sec: 12912.8). Total num frames: 17063936. Throughput: 0: 6445.4, 1: 6445.6. Samples: 17050306. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0)
[2023-09-21 13:45:19,288][52220] Avg episode reward: [(0, '7129.868'), (1, '4730.448')]
[2023-09-21 13:45:23,757][52980] Updated weights for policy 0, policy_version 16720 (0.0014)
[2023-09-21 13:45:23,757][52979] Updated weights for policy 1, policy_version 16720 (0.0013)
[2023-09-21 13:45:24,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12885.0). Total num frames: 17121280. Throughput: 0: 6444.9, 1: 6445.2. Samples: 17128480. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:45:24,288][52220] Avg episode reward: [(0, '7221.769'), (1, '4524.132')]
[2023-09-21 13:45:29,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17195008. Throughput: 0: 6471.3, 1: 6469.4. Samples: 17170430. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:45:29,287][52220] Avg episode reward: [(0, '6757.209'), (1, '5528.581')]
[2023-09-21 13:45:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016792_8597504.pth...
[2023-09-21 13:45:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016792_8597504.pth...
[2023-09-21 13:45:29,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016416_8404992.pth
[2023-09-21 13:45:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016416_8404992.pth
[2023-09-21 13:45:29,823][52979] Updated weights for policy 1, policy_version 16800 (0.0012)
[2023-09-21 13:45:29,823][52980] Updated weights for policy 0, policy_version 16800 (0.0013)
[2023-09-21 13:45:34,286][52220] Fps is (10 sec: 13926.7, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17260544. Throughput: 0: 6484.9, 1: 6485.3. Samples: 17247432. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:45:34,287][52220] Avg episode reward: [(0, '6760.660'), (1, '6089.937')]
[2023-09-21 13:45:36,145][52980] Updated weights for policy 0, policy_version 16880 (0.0011)
[2023-09-21 13:45:36,146][52979] Updated weights for policy 1, policy_version 16880 (0.0012)
[2023-09-21 13:45:39,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17326080. Throughput: 0: 6468.8, 1: 6468.0. Samples: 17326014. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:45:39,288][52220] Avg episode reward: [(0, '6953.498'), (1, '6911.194')]
[2023-09-21 13:45:42,392][52979] Updated weights for policy 1, policy_version 16960 (0.0010)
[2023-09-21 13:45:42,393][52980] Updated weights for policy 0, policy_version 16960 (0.0013)
[2023-09-21 13:45:44,286][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.2, 300 sec: 12885.0). Total num frames: 17383424. Throughput: 0: 6443.1, 1: 6445.0. Samples: 17365356. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:45:44,287][52220] Avg episode reward: [(0, '6680.874'), (1, '7279.392')]
[2023-09-21 13:45:44,330][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016984_8695808.pth...
[2023-09-21 13:45:44,333][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016600_8499200.pth
[2023-09-21 13:45:44,336][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016984_8695808.pth...
[2023-09-21 13:45:44,340][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016600_8499200.pth
[2023-09-21 13:45:48,771][52980] Updated weights for policy 0, policy_version 17040 (0.0011)
[2023-09-21 13:45:48,772][52979] Updated weights for policy 1, policy_version 17040 (0.0012)
[2023-09-21 13:45:49,287][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.6, 300 sec: 12885.0). Total num frames: 17448960. Throughput: 0: 6463.4, 1: 6464.4. Samples: 17441036. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:45:49,287][52220] Avg episode reward: [(0, '6579.048'), (1, '7368.600')]
[2023-09-21 13:45:54,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12885.1). Total num frames: 17514496. Throughput: 0: 6449.2, 1: 6449.1. Samples: 17515620. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:45:54,287][52220] Avg episode reward: [(0, '7315.701'), (1, '7733.072')]
[2023-09-21 13:45:55,452][52980] Updated weights for policy 0, policy_version 17120 (0.0010)
[2023-09-21 13:45:55,452][52979] Updated weights for policy 1, policy_version 17120 (0.0014)
[2023-09-21 13:45:59,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12697.7, 300 sec: 12857.3). Total num frames: 17571840. Throughput: 0: 6446.7, 1: 6448.4. Samples: 17554018. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:45:59,287][52220] Avg episode reward: [(0, '7225.743'), (1, '7824.554')]
[2023-09-21 13:45:59,336][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017168_8790016.pth...
[2023-09-21 13:45:59,339][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016792_8597504.pth
[2023-09-21 13:45:59,347][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017168_8790016.pth...
[2023-09-21 13:45:59,350][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016792_8597504.pth
[2023-09-21 13:46:01,961][52979] Updated weights for policy 1, policy_version 17200 (0.0012)
[2023-09-21 13:46:01,961][52980] Updated weights for policy 0, policy_version 17200 (0.0015)
[2023-09-21 13:46:04,286][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 17645568. Throughput: 0: 6437.0, 1: 6436.7. Samples: 17629618. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:46:04,287][52220] Avg episode reward: [(0, '7133.540'), (1, '7552.845')]
[2023-09-21 13:46:07,869][52979] Updated weights for policy 1, policy_version 17280 (0.0013)
[2023-09-21 13:46:07,869][52980] Updated weights for policy 0, policy_version 17280 (0.0014)
[2023-09-21 13:46:09,286][52220] Fps is (10 sec: 13926.4, 60 sec: 12970.6, 300 sec: 12885.0). Total num frames: 17711104. Throughput: 0: 6499.3, 1: 6499.1. Samples: 17713410. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:46:09,287][52220] Avg episode reward: [(0, '7224.133'), (1, '7191.170')]
[2023-09-21 13:46:14,028][52979] Updated weights for policy 1, policy_version 17360 (0.0014)
[2023-09-21 13:46:14,028][52980] Updated weights for policy 0, policy_version 17360 (0.0013)
[2023-09-21 13:46:14,287][52220] Fps is (10 sec: 13106.9, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 17776640. Throughput: 0: 6461.5, 1: 6461.9. Samples: 17751988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:46:14,288][52220] Avg episode reward: [(0, '7406.381'), (1, '6908.129')]
[2023-09-21 13:46:14,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017360_8888320.pth...
[2023-09-21 13:46:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017360_8888320.pth...
[2023-09-21 13:46:14,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000016984_8695808.pth
[2023-09-21 13:46:14,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000016984_8695808.pth
[2023-09-21 13:46:19,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 17842176. Throughput: 0: 6503.4, 1: 6502.8. Samples: 17832712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:46:19,288][52220] Avg episode reward: [(0, '7033.404'), (1, '6989.562')]
[2023-09-21 13:46:20,321][52979] Updated weights for policy 1, policy_version 17440 (0.0011)
[2023-09-21 13:46:20,321][52980] Updated weights for policy 0, policy_version 17440 (0.0016)
[2023-09-21 13:46:24,287][52220] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12885.1). Total num frames: 17907712. Throughput: 0: 6464.3, 1: 6463.8. Samples: 17907776. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:46:24,287][52220] Avg episode reward: [(0, '7033.707'), (1, '7080.749')]
[2023-09-21 13:46:26,690][52980] Updated weights for policy 0, policy_version 17520 (0.0013)
[2023-09-21 13:46:26,690][52979] Updated weights for policy 1, policy_version 17520 (0.0012)
[2023-09-21 13:46:29,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 17973248. Throughput: 0: 6475.4, 1: 6475.6. Samples: 17948152. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:46:29,287][52220] Avg episode reward: [(0, '7032.291'), (1, '7447.172')]
[2023-09-21 13:46:29,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017552_8986624.pth...
[2023-09-21 13:46:29,294][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017552_8986624.pth...
[2023-09-21 13:46:29,301][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017168_8790016.pth
[2023-09-21 13:46:29,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017168_8790016.pth
[2023-09-21 13:46:33,080][52980] Updated weights for policy 0, policy_version 17600 (0.0011)
[2023-09-21 13:46:33,080][52979] Updated weights for policy 1, policy_version 17600 (0.0013)
[2023-09-21 13:46:34,287][52220] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18030592. Throughput: 0: 6474.1, 1: 6474.6. Samples: 18023728. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:46:34,288][52220] Avg episode reward: [(0, '7311.416'), (1, '7624.408')]
[2023-09-21 13:46:39,287][52220] Fps is (10 sec: 12287.9, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18096128. Throughput: 0: 6526.3, 1: 6526.3. Samples: 18102990. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:46:39,288][52220] Avg episode reward: [(0, '7127.357'), (1, '7631.744')]
[2023-09-21 13:46:39,332][52979] Updated weights for policy 1, policy_version 17680 (0.0014)
[2023-09-21 13:46:39,332][52980] Updated weights for policy 0, policy_version 17680 (0.0011)
[2023-09-21 13:46:44,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 18161664. Throughput: 0: 6517.1, 1: 6516.0. Samples: 18140504. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:46:44,287][52220] Avg episode reward: [(0, '7406.257'), (1, '7722.659')]
[2023-09-21 13:46:44,293][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017736_9080832.pth...
[2023-09-21 13:46:44,294][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017736_9080832.pth...
[2023-09-21 13:46:44,298][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017360_8888320.pth
[2023-09-21 13:46:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017360_8888320.pth
[2023-09-21 13:46:45,896][52979] Updated weights for policy 1, policy_version 17760 (0.0015)
[2023-09-21 13:46:45,896][52980] Updated weights for policy 0, policy_version 17760 (0.0014)
[2023-09-21 13:46:49,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 18227200. Throughput: 0: 6497.4, 1: 6497.0. Samples: 18214364. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:46:49,287][52220] Avg episode reward: [(0, '7312.815'), (1, '7638.060')]
[2023-09-21 13:46:52,597][52979] Updated weights for policy 1, policy_version 17840 (0.0011)
[2023-09-21 13:46:52,598][52980] Updated weights for policy 0, policy_version 17840 (0.0014)
[2023-09-21 13:46:54,286][52220] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 12829.5). Total num frames: 18284544. Throughput: 0: 6381.2, 1: 6380.8. Samples: 18287702. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:46:54,287][52220] Avg episode reward: [(0, '7124.114'), (1, '7818.946')]
[2023-09-21 13:46:59,078][52979] Updated weights for policy 1, policy_version 17920 (0.0013)
[2023-09-21 13:46:59,079][52980] Updated weights for policy 0, policy_version 17920 (0.0014)
[2023-09-21 13:46:59,287][52220] Fps is (10 sec: 12287.8, 60 sec: 12970.6, 300 sec: 12829.5). Total num frames: 18350080. Throughput: 0: 6373.4, 1: 6372.9. Samples: 18325572. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:46:59,288][52220] Avg episode reward: [(0, '7589.081'), (1, '7720.539')]
[2023-09-21 13:46:59,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017920_9175040.pth...
[2023-09-21 13:46:59,296][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017920_9175040.pth...
[2023-09-21 13:46:59,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017552_8986624.pth
[2023-09-21 13:46:59,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017552_8986624.pth
[2023-09-21 13:47:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18415616. Throughput: 0: 6328.9, 1: 6329.4. Samples: 18402336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:04,287][52220] Avg episode reward: [(0, '7495.950'), (1, '8078.740')]
[2023-09-21 13:47:05,541][52980] Updated weights for policy 0, policy_version 18000 (0.0011)
[2023-09-21 13:47:05,542][52979] Updated weights for policy 1, policy_version 18000 (0.0012)
[2023-09-21 13:47:09,287][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18481152. Throughput: 0: 6377.3, 1: 6379.5. Samples: 18481830. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:09,288][52220] Avg episode reward: [(0, '7682.699'), (1, '8168.326')]
[2023-09-21 13:47:11,515][52980] Updated weights for policy 0, policy_version 18080 (0.0012)
[2023-09-21 13:47:11,515][52979] Updated weights for policy 1, policy_version 18080 (0.0014)
[2023-09-21 13:47:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18546688. Throughput: 0: 6380.0, 1: 6379.8. Samples: 18522342. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:14,288][52220] Avg episode reward: [(0, '7682.805'), (1, '8267.660')]
[2023-09-21 13:47:14,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018112_9273344.pth...
[2023-09-21 13:47:14,296][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018112_9273344.pth...
[2023-09-21 13:47:14,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017736_9080832.pth
[2023-09-21 13:47:14,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017736_9080832.pth
[2023-09-21 13:47:17,693][52980] Updated weights for policy 0, policy_version 18160 (0.0015)
[2023-09-21 13:47:17,693][52979] Updated weights for policy 1, policy_version 18160 (0.0011)
[2023-09-21 13:47:19,286][52220] Fps is (10 sec: 13107.4, 60 sec: 12834.2, 300 sec: 12857.3). Total num frames: 18612224. Throughput: 0: 6444.6, 1: 6442.6. Samples: 18603646. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:19,287][52220] Avg episode reward: [(0, '7684.587'), (1, '8452.451')]
[2023-09-21 13:47:24,195][52980] Updated weights for policy 0, policy_version 18240 (0.0014)
[2023-09-21 13:47:24,195][52979] Updated weights for policy 1, policy_version 18240 (0.0014)
[2023-09-21 13:47:24,286][52220] Fps is (10 sec: 13107.5, 60 sec: 12834.2, 300 sec: 12857.3). Total num frames: 18677760. Throughput: 0: 6387.4, 1: 6386.1. Samples: 18677794. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:24,287][52220] Avg episode reward: [(0, '7500.530'), (1, '8360.955')]
[2023-09-21 13:47:29,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 18743296. Throughput: 0: 6420.5, 1: 6420.1. Samples: 18718336. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:47:29,288][52220] Avg episode reward: [(0, '7414.395'), (1, '8452.178')]
[2023-09-21 13:47:29,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018304_9371648.pth...
[2023-09-21 13:47:29,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018304_9371648.pth...
[2023-09-21 13:47:29,303][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000017920_9175040.pth
[2023-09-21 13:47:29,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000017920_9175040.pth
[2023-09-21 13:47:30,246][52979] Updated weights for policy 1, policy_version 18320 (0.0012)
[2023-09-21 13:47:30,246][52980] Updated weights for policy 0, policy_version 18320 (0.0014)
[2023-09-21 13:47:34,287][52220] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 18808832. Throughput: 0: 6471.3, 1: 6471.8. Samples: 18796804. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:47:34,287][52220] Avg episode reward: [(0, '6953.468'), (1, '8446.490')]
[2023-09-21 13:47:36,669][52980] Updated weights for policy 0, policy_version 18400 (0.0012)
[2023-09-21 13:47:36,670][52979] Updated weights for policy 1, policy_version 18400 (0.0016)
[2023-09-21 13:47:39,286][52220] Fps is (10 sec: 13107.6, 60 sec: 12970.7, 300 sec: 12885.1). Total num frames: 18874368. Throughput: 0: 6524.5, 1: 6524.3. Samples: 18874896. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2023-09-21 13:47:39,287][52220] Avg episode reward: [(0, '6953.002'), (1, '8442.549')]
[2023-09-21 13:47:43,014][52980] Updated weights for policy 0, policy_version 18480 (0.0014)
[2023-09-21 13:47:43,014][52979] Updated weights for policy 1, policy_version 18480 (0.0011)
[2023-09-21 13:47:44,286][52220] Fps is (10 sec: 13107.3, 60 sec: 12970.6, 300 sec: 12885.0). Total num frames: 18939904. Throughput: 0: 6533.9, 1: 6535.9. Samples: 18913714. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:44,287][52220] Avg episode reward: [(0, '6484.211'), (1, '8712.539')]
[2023-09-21 13:47:44,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018496_9469952.pth...
[2023-09-21 13:47:44,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018496_9469952.pth...
[2023-09-21 13:47:44,300][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018112_9273344.pth
[2023-09-21 13:47:44,302][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018112_9273344.pth
[2023-09-21 13:47:48,928][52979] Updated weights for policy 1, policy_version 18560 (0.0012)
[2023-09-21 13:47:48,929][52980] Updated weights for policy 0, policy_version 18560 (0.0013)
[2023-09-21 13:47:49,287][52220] Fps is (10 sec: 13106.9, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 19005440. Throughput: 0: 6599.1, 1: 6599.0. Samples: 18996248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:49,288][52220] Avg episode reward: [(0, '6296.169'), (1, '8897.060')]
[2023-09-21 13:47:54,286][52220] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 19070976. Throughput: 0: 6568.1, 1: 6568.0. Samples: 19072954. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:54,287][52220] Avg episode reward: [(0, '6668.544'), (1, '8988.684')]
[2023-09-21 13:47:55,314][52980] Updated weights for policy 0, policy_version 18640 (0.0014)
[2023-09-21 13:47:55,315][52979] Updated weights for policy 1, policy_version 18640 (0.0015)
[2023-09-21 13:47:59,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 19136512. Throughput: 0: 6552.9, 1: 6550.7. Samples: 19112004. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:47:59,288][52220] Avg episode reward: [(0, '7318.485'), (1, '8989.116')]
[2023-09-21 13:47:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018688_9568256.pth...
[2023-09-21 13:47:59,298][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018688_9568256.pth...
[2023-09-21 13:47:59,300][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018304_9371648.pth
[2023-09-21 13:47:59,302][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018304_9371648.pth
[2023-09-21 13:48:01,469][52979] Updated weights for policy 1, policy_version 18720 (0.0013)
[2023-09-21 13:48:01,469][52980] Updated weights for policy 0, policy_version 18720 (0.0013)
[2023-09-21 13:48:04,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 19202048. Throughput: 0: 6533.9, 1: 6535.2. Samples: 19191760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:48:04,288][52220] Avg episode reward: [(0, '6765.058'), (1, '9081.704')]
[2023-09-21 13:48:08,066][52980] Updated weights for policy 0, policy_version 18800 (0.0011)
[2023-09-21 13:48:08,068][52979] Updated weights for policy 1, policy_version 18800 (0.0016)
[2023-09-21 13:48:09,286][52220] Fps is (10 sec: 12288.3, 60 sec: 12970.7, 300 sec: 12857.3). Total num frames: 19259392. Throughput: 0: 6537.6, 1: 6538.8. Samples: 19266228. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:48:09,287][52220] Avg episode reward: [(0, '6766.459'), (1, '8991.223')]
[2023-09-21 13:48:14,232][52979] Updated weights for policy 1, policy_version 18880 (0.0010)
[2023-09-21 13:48:14,232][52980] Updated weights for policy 0, policy_version 18880 (0.0012)
[2023-09-21 13:48:14,287][52220] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 19333120. Throughput: 0: 6519.4, 1: 6520.5. Samples: 19305134. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:48:14,288][52220] Avg episode reward: [(0, '7039.722'), (1, '8621.066')]
[2023-09-21 13:48:14,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018880_9666560.pth...
[2023-09-21 13:48:14,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018880_9666560.pth...
[2023-09-21 13:48:14,304][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018496_9469952.pth
[2023-09-21 13:48:14,307][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018496_9469952.pth
[2023-09-21 13:48:19,286][52220] Fps is (10 sec: 13926.2, 60 sec: 13107.2, 300 sec: 12885.0). Total num frames: 19398656. Throughput: 0: 6548.1, 1: 6547.7. Samples: 19386116. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:48:19,287][52220] Avg episode reward: [(0, '7041.964'), (1, '8620.213')]
[2023-09-21 13:48:20,512][52979] Updated weights for policy 1, policy_version 18960 (0.0010)
[2023-09-21 13:48:20,513][52980] Updated weights for policy 0, policy_version 18960 (0.0014)
[2023-09-21 13:48:24,286][52220] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 19464192. Throughput: 0: 6545.2, 1: 6544.5. Samples: 19463938. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:48:24,287][52220] Avg episode reward: [(0, '6856.232'), (1, '8529.884')]
[2023-09-21 13:48:26,835][52980] Updated weights for policy 0, policy_version 19040 (0.0014)
[2023-09-21 13:48:26,835][52979] Updated weights for policy 1, policy_version 19040 (0.0014)
[2023-09-21 13:48:29,286][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12885.0). Total num frames: 19521536. Throughput: 0: 6538.3, 1: 6538.8. Samples: 19502180. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:48:29,287][52220] Avg episode reward: [(0, '6569.805'), (1, '8345.107')]
[2023-09-21 13:48:29,295][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019064_9760768.pth...
[2023-09-21 13:48:29,295][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019064_9760768.pth...
[2023-09-21 13:48:29,299][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018688_9568256.pth
[2023-09-21 13:48:29,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018688_9568256.pth
[2023-09-21 13:48:33,115][52980] Updated weights for policy 0, policy_version 19120 (0.0010)
[2023-09-21 13:48:33,115][52979] Updated weights for policy 1, policy_version 19120 (0.0016)
[2023-09-21 13:48:34,287][52220] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 19587072. Throughput: 0: 6481.6, 1: 6481.7. Samples: 19579596. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0)
[2023-09-21 13:48:34,287][52220] Avg episode reward: [(0, '6385.226'), (1, '8530.049')]
[2023-09-21 13:48:39,255][52979] Updated weights for policy 1, policy_version 19200 (0.0014)
[2023-09-21 13:48:39,256][52980] Updated weights for policy 0, policy_version 19200 (0.0013)
[2023-09-21 13:48:39,286][52220] Fps is (10 sec: 13926.5, 60 sec: 13107.2, 300 sec: 12940.6). Total num frames: 19660800. Throughput: 0: 6531.7, 1: 6530.3. Samples: 19660742. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:48:39,287][52220] Avg episode reward: [(0, '6850.179'), (1, '8527.774')]
[2023-09-21 13:48:44,287][52220] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 12940.6). Total num frames: 19726336. Throughput: 0: 6547.3, 1: 6548.7. Samples: 19701322. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0)
[2023-09-21 13:48:44,288][52220] Avg episode reward: [(0, '7131.582'), (1, '8528.641')]
[2023-09-21 13:48:44,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019264_9863168.pth...
[2023-09-21 13:48:44,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019264_9863168.pth...
[2023-09-21 13:48:44,301][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000018880_9666560.pth
[2023-09-21 13:48:44,303][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000018880_9666560.pth
[2023-09-21 13:48:45,625][52979] Updated weights for policy 1, policy_version 19280 (0.0012)
[2023-09-21 13:48:45,625][52980] Updated weights for policy 0, policy_version 19280 (0.0015)
[2023-09-21 13:48:49,286][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 19783680. Throughput: 0: 6504.2, 1: 6504.5. Samples: 19777148. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:48:49,287][52220] Avg episode reward: [(0, '7595.058'), (1, '8712.437')]
[2023-09-21 13:48:51,930][52980] Updated weights for policy 0, policy_version 19360 (0.0014)
[2023-09-21 13:48:51,931][52979] Updated weights for policy 1, policy_version 19360 (0.0013)
[2023-09-21 13:48:54,287][52220] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12912.8). Total num frames: 19849216. Throughput: 0: 6521.0, 1: 6520.6. Samples: 19853104. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:48:54,287][52220] Avg episode reward: [(0, '7593.209'), (1, '8711.769')]
[2023-09-21 13:48:58,371][52980] Updated weights for policy 0, policy_version 19440 (0.0014)
[2023-09-21 13:48:58,371][52979] Updated weights for policy 1, policy_version 19440 (0.0013)
[2023-09-21 13:48:59,287][52220] Fps is (10 sec: 13106.8, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 19914752. Throughput: 0: 6514.9, 1: 6513.9. Samples: 19891434. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0)
[2023-09-21 13:48:59,288][52220] Avg episode reward: [(0, '7314.703'), (1, '8711.742')]
[2023-09-21 13:48:59,297][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019448_9957376.pth...
[2023-09-21 13:48:59,297][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019448_9957376.pth...
[2023-09-21 13:48:59,304][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019064_9760768.pth
[2023-09-21 13:48:59,314][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019064_9760768.pth
[2023-09-21 13:49:04,287][52220] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12940.6). Total num frames: 19980288. Throughput: 0: 6483.0, 1: 6484.0. Samples: 19969630. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0)
[2023-09-21 13:49:04,287][52220] Avg episode reward: [(0, '7501.187'), (1, '8711.551')]
[2023-09-21 13:49:04,683][52980] Updated weights for policy 0, policy_version 19520 (0.0012)
[2023-09-21 13:49:04,684][52979] Updated weights for policy 1, policy_version 19520 (0.0013)
[2023-09-21 13:49:06,521][52885] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000
[2023-09-21 13:49:06,522][52985] Stopping RolloutWorker_w2...
[2023-09-21 13:49:06,523][52985] Loop rollout_proc2_evt_loop terminating...
[2023-09-21 13:49:06,522][52982] Stopping RolloutWorker_w0...
[2023-09-21 13:49:06,522][52220] Component RolloutWorker_w2 stopped!
[2023-09-21 13:49:06,522][52986] Stopping RolloutWorker_w3...
[2023-09-21 13:49:06,522][52988] Stopping RolloutWorker_w5...
[2023-09-21 13:49:06,523][52990] Stopping RolloutWorker_w7...
[2023-09-21 13:49:06,523][52989] Stopping RolloutWorker_w6...
[2023-09-21 13:49:06,523][52987] Stopping RolloutWorker_w4...
[2023-09-21 13:49:06,523][52984] Stopping RolloutWorker_w1...
[2023-09-21 13:49:06,523][52982] Loop rollout_proc0_evt_loop terminating...
[2023-09-21 13:49:06,523][52986] Loop rollout_proc3_evt_loop terminating...
[2023-09-21 13:49:06,523][52220] Component RolloutWorker_w3 stopped!
[2023-09-21 13:49:06,523][52884] Stopping Batcher_0...
[2023-09-21 13:49:06,523][52990] Loop rollout_proc7_evt_loop terminating...
[2023-09-21 13:49:06,523][52988] Loop rollout_proc5_evt_loop terminating...
[2023-09-21 13:49:06,523][52220] Component RolloutWorker_w0 stopped!
[2023-09-21 13:49:06,523][52989] Loop rollout_proc6_evt_loop terminating...
[2023-09-21 13:49:06,523][52984] Loop rollout_proc1_evt_loop terminating...
[2023-09-21 13:49:06,523][52987] Loop rollout_proc4_evt_loop terminating...
[2023-09-21 13:49:06,524][52220] Component RolloutWorker_w5 stopped!
[2023-09-21 13:49:06,524][52884] Loop batcher_evt_loop terminating...
[2023-09-21 13:49:06,524][52220] Component RolloutWorker_w4 stopped!
[2023-09-21 13:49:06,524][52220] Component RolloutWorker_w6 stopped!
[2023-09-21 13:49:06,525][52220] Component RolloutWorker_w1 stopped!
[2023-09-21 13:49:06,524][52885] Stopping Batcher_1...
[2023-09-21 13:49:06,525][52220] Component RolloutWorker_w7 stopped!
[2023-09-21 13:49:06,525][52220] Component Batcher_0 stopped!
[2023-09-21 13:49:06,525][52220] Component Batcher_1 stopped!
[2023-09-21 13:49:06,526][52884] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000
[2023-09-21 13:49:06,525][52885] Loop batcher_evt_loop terminating...
[2023-09-21 13:49:06,526][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019544_10006528.pth...
[2023-09-21 13:49:06,526][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019544_10006528.pth...
[2023-09-21 13:49:06,529][52884] Removing ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019264_9863168.pth
[2023-09-21 13:49:06,530][52884] Saving ./train_dir/DoublePendulum/checkpoint_p0/checkpoint_000019544_10006528.pth...
[2023-09-21 13:49:06,533][52884] Stopping LearnerWorker_p0...
[2023-09-21 13:49:06,533][52884] Loop learner_proc0_evt_loop terminating...
[2023-09-21 13:49:06,533][52220] Component LearnerWorker_p0 stopped!
[2023-09-21 13:49:06,534][52885] Removing ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019264_9863168.pth
[2023-09-21 13:49:06,535][52885] Saving ./train_dir/DoublePendulum/checkpoint_p1/checkpoint_000019544_10006528.pth...
[2023-09-21 13:49:06,539][52885] Stopping LearnerWorker_p1...
[2023-09-21 13:49:06,539][52885] Loop learner_proc1_evt_loop terminating...
[2023-09-21 13:49:06,539][52220] Component LearnerWorker_p1 stopped!
[2023-09-21 13:49:06,578][52979] Weights refcount: 2 0
[2023-09-21 13:49:06,579][52979] Stopping InferenceWorker_p1-w0...
[2023-09-21 13:49:06,579][52980] Weights refcount: 2 0
[2023-09-21 13:49:06,579][52979] Loop inference_proc1-0_evt_loop terminating...
[2023-09-21 13:49:06,579][52220] Component InferenceWorker_p1-w0 stopped!
[2023-09-21 13:49:06,580][52980] Stopping InferenceWorker_p0-w0...
[2023-09-21 13:49:06,580][52980] Loop inference_proc0-0_evt_loop terminating...
[2023-09-21 13:49:06,580][52220] Component InferenceWorker_p0-w0 stopped!
[2023-09-21 13:49:06,580][52220] Waiting for process learner_proc0 to stop...
[2023-09-21 13:49:07,107][52220] Waiting for process learner_proc1 to stop...
[2023-09-21 13:49:07,131][52220] Waiting for process inference_proc0-0 to join...
[2023-09-21 13:49:07,189][52220] Waiting for process inference_proc1-0 to join...
[2023-09-21 13:49:07,190][52220] Waiting for process rollout_proc0 to join...
[2023-09-21 13:49:07,190][52220] Waiting for process rollout_proc1 to join...
[2023-09-21 13:49:07,191][52220] Waiting for process rollout_proc2 to join...
[2023-09-21 13:49:07,192][52220] Waiting for process rollout_proc3 to join...
[2023-09-21 13:49:07,192][52220] Waiting for process rollout_proc4 to join...
[2023-09-21 13:49:07,193][52220] Waiting for process rollout_proc5 to join...
[2023-09-21 13:49:07,193][52220] Waiting for process rollout_proc6 to join...
[2023-09-21 13:49:07,194][52220] Waiting for process rollout_proc7 to join...
[2023-09-21 13:49:07,194][52220] Batcher 0 profile tree view:
batching: 40.1539, releasing_batches: 3.4137
[2023-09-21 13:49:07,195][52220] Batcher 1 profile tree view:
batching: 39.8640, releasing_batches: 3.4962
[2023-09-21 13:49:07,195][52220] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0052
wait_policy_total: 206.1246
update_model: 19.4573
weight_update: 0.0014
one_step: 0.0012
handle_policy_step: 1227.2543
deserialize: 33.0974, stack: 7.3991, obs_to_device_normalize: 246.6086, forward: 612.8423, send_messages: 100.2734
prepare_outputs: 157.0522
to_cpu: 78.7695
[2023-09-21 13:49:07,196][52220] InferenceWorker_p1-w0 profile tree view:
wait_policy: 0.0052
wait_policy_total: 206.1224
update_model: 19.1857
weight_update: 0.0015
one_step: 0.0011
handle_policy_step: 1228.1887
deserialize: 33.5167, stack: 7.7549, obs_to_device_normalize: 247.0163, forward: 616.7188, send_messages: 97.8848
prepare_outputs: 154.8795
to_cpu: 78.1772
[2023-09-21 13:49:07,197][52220] Learner 0 profile tree view:
misc: 0.0148, prepare_batch: 21.9078
train: 106.3967
epoch_init: 0.0623, minibatch_init: 1.7514, losses_postprocess: 2.8950, kl_divergence: 1.3439, after_optimizer: 1.6311
calculate_losses: 31.6922
losses_init: 0.0586, forward_head: 3.6187, bptt_initial: 0.2110, bptt: 0.2228, tail: 11.9533, advantages_returns: 1.6383, losses: 12.0433
update: 64.8146
clip: 8.3106
[2023-09-21 13:49:07,198][52220] Learner 1 profile tree view:
misc: 0.0145, prepare_batch: 21.7773
train: 106.0387
epoch_init: 0.0630, minibatch_init: 1.7017, losses_postprocess: 2.8528, kl_divergence: 1.3355, after_optimizer: 1.5977
calculate_losses: 31.4837
losses_init: 0.0557, forward_head: 3.6214, bptt_initial: 0.2100, bptt: 0.2298, tail: 11.8125, advantages_returns: 1.6091, losses: 11.9833
update: 64.7614
clip: 8.2701
[2023-09-21 13:49:07,198][52220] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 1.6159, enqueue_policy_requests: 75.0153, complete_rollouts: 2.5135, env_step: 470.5115, overhead: 91.9338
save_policy_outputs: 175.1953
split_output_tensors: 59.5794
[2023-09-21 13:49:07,199][52220] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 1.5498, enqueue_policy_requests: 71.7281, complete_rollouts: 2.4267, env_step: 455.7171, overhead: 88.3418
save_policy_outputs: 170.8656
split_output_tensors: 57.4403
[2023-09-21 13:49:07,200][52220] Loop Runner_EvtLoop terminating...
[2023-09-21 13:49:07,201][52220] Runner profile tree view:
main_loop: 1549.2240
[2023-09-21 13:49:07,201][52220] Collected {0: 10006528, 1: 10006528}, FPS: 12918.1