vizdoom_battle / sf_log.txt
MattStammers's picture
Upload folder using huggingface_hub
47fcc5f
[2023-09-14 13:12:24,601][61308] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-14 13:12:24,602][61308] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2023-09-14 13:12:24,620][61308] Num visible devices: 1
[2023-09-14 13:12:24,647][61308] Starting seed is not provided
[2023-09-14 13:12:24,647][61308] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-14 13:12:24,647][61308] Initializing actor-critic model on device cuda:0
[2023-09-14 13:12:24,648][61308] RunningMeanStd input shape: (23,)
[2023-09-14 13:12:24,648][61308] RunningMeanStd input shape: (3, 72, 128)
[2023-09-14 13:12:24,649][61308] RunningMeanStd input shape: (1,)
[2023-09-14 13:12:24,661][61308] ConvEncoder: input_channels=3
[2023-09-14 13:12:24,777][61308] Conv encoder output size: 512
[2023-09-14 13:12:24,778][61308] Policy head output size: 640
[2023-09-14 13:12:24,796][61308] Created Actor Critic model with architecture:
[2023-09-14 13:12:24,797][61308] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(measurements): RunningMeanStdInPlace()
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
(measurements_head): Sequential(
(0): Linear(in_features=23, out_features=128, bias=True)
(1): ELU(alpha=1.0)
(2): Linear(in_features=128, out_features=128, bias=True)
(3): ELU(alpha=1.0)
)
)
(core): ModelCoreRNN(
(core): GRU(640, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=21, bias=True)
)
)
[2023-09-14 13:12:25,706][61308] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-09-14 13:12:25,707][61308] No checkpoints found
[2023-09-14 13:12:25,707][61308] Did not load from checkpoint, starting from scratch!
[2023-09-14 13:12:25,707][61308] Initialized policy 0 weights for model version 0
[2023-09-14 13:12:25,710][61308] LearnerWorker_p0 finished initialization!
[2023-09-14 13:12:25,710][61308] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-14 13:12:26,305][61425] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-14 13:12:26,306][61425] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1
[2023-09-14 13:12:26,341][61425] Num visible devices: 1
[2023-09-14 13:12:26,385][61425] Starting seed is not provided
[2023-09-14 13:12:26,385][61425] Using GPUs [0] for process 1 (actually maps to GPUs [1])
[2023-09-14 13:12:26,385][61425] Initializing actor-critic model on device cuda:0
[2023-09-14 13:12:26,385][61425] RunningMeanStd input shape: (23,)
[2023-09-14 13:12:26,386][61425] RunningMeanStd input shape: (3, 72, 128)
[2023-09-14 13:12:26,387][61425] RunningMeanStd input shape: (1,)
[2023-09-14 13:12:26,407][61425] ConvEncoder: input_channels=3
[2023-09-14 13:12:26,581][61425] Conv encoder output size: 512
[2023-09-14 13:12:26,582][61425] Policy head output size: 640
[2023-09-14 13:12:26,601][61425] Created Actor Critic model with architecture:
[2023-09-14 13:12:26,601][61425] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(measurements): RunningMeanStdInPlace()
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
(measurements_head): Sequential(
(0): Linear(in_features=23, out_features=128, bias=True)
(1): ELU(alpha=1.0)
(2): Linear(in_features=128, out_features=128, bias=True)
(3): ELU(alpha=1.0)
)
)
(core): ModelCoreRNN(
(core): GRU(640, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=21, bias=True)
)
)
[2023-09-14 13:12:27,798][61425] Using optimizer <class 'torch.optim.adam.Adam'>
[2023-09-14 13:12:27,799][61425] No checkpoints found
[2023-09-14 13:12:27,799][61425] Did not load from checkpoint, starting from scratch!
[2023-09-14 13:12:27,799][61425] Initialized policy 1 weights for model version 0
[2023-09-14 13:12:27,801][61425] LearnerWorker_p1 finished initialization!
[2023-09-14 13:12:27,801][61425] Using GPUs [0] for process 1 (actually maps to GPUs [1])
[2023-09-14 13:12:28,404][61593] Using GPUs [1] for process 1 (actually maps to GPUs [1])
[2023-09-14 13:12:28,405][61593] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1
[2023-09-14 13:12:28,423][61593] Num visible devices: 1
[2023-09-14 13:12:28,497][61692] Worker 7 uses CPU cores [28, 29, 30, 31]
[2023-09-14 13:12:28,511][61594] Worker 1 uses CPU cores [4, 5, 6, 7]
[2023-09-14 13:12:28,567][61592] Worker 0 uses CPU cores [0, 1, 2, 3]
[2023-09-14 13:12:28,607][61595] Worker 2 uses CPU cores [8, 9, 10, 11]
[2023-09-14 13:12:28,748][61591] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2023-09-14 13:12:28,748][61591] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2023-09-14 13:12:28,766][61591] Num visible devices: 1
[2023-09-14 13:12:28,783][61690] Worker 5 uses CPU cores [20, 21, 22, 23]
[2023-09-14 13:12:28,788][61633] Worker 4 uses CPU cores [16, 17, 18, 19]
[2023-09-14 13:12:28,869][61691] Worker 6 uses CPU cores [24, 25, 26, 27]
[2023-09-14 13:12:28,916][61631] Worker 3 uses CPU cores [12, 13, 14, 15]
[2023-09-14 13:12:29,210][61593] RunningMeanStd input shape: (23,)
[2023-09-14 13:12:29,210][61593] RunningMeanStd input shape: (3, 72, 128)
[2023-09-14 13:12:29,211][61593] RunningMeanStd input shape: (1,)
[2023-09-14 13:12:29,223][61593] ConvEncoder: input_channels=3
[2023-09-14 13:12:29,325][61593] Conv encoder output size: 512
[2023-09-14 13:12:29,326][61593] Policy head output size: 640
[2023-09-14 13:12:29,466][61591] RunningMeanStd input shape: (23,)
[2023-09-14 13:12:29,466][61591] RunningMeanStd input shape: (3, 72, 128)
[2023-09-14 13:12:29,467][61591] RunningMeanStd input shape: (1,)
[2023-09-14 13:12:29,478][61591] ConvEncoder: input_channels=3
[2023-09-14 13:12:29,580][61591] Conv encoder output size: 512
[2023-09-14 13:12:29,581][61591] Policy head output size: 640
[2023-09-14 13:12:29,918][61692] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:29,918][61631] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:29,925][61690] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:29,925][61592] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:29,925][61633] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:29,925][61594] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:29,925][61595] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:29,925][61691] Doom resolution: 160x120, resize resolution: (128, 72)
[2023-09-14 13:12:30,253][61692] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,256][61592] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,339][61594] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,344][61690] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,374][61595] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,375][61691] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,386][61631] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,529][61692] Decorrelating experience for 32 frames...
[2023-09-14 13:12:30,646][61595] Decorrelating experience for 32 frames...
[2023-09-14 13:12:30,663][61633] Decorrelating experience for 0 frames...
[2023-09-14 13:12:30,674][61691] Decorrelating experience for 32 frames...
[2023-09-14 13:12:30,675][61631] Decorrelating experience for 32 frames...
[2023-09-14 13:12:30,680][61592] Decorrelating experience for 32 frames...
[2023-09-14 13:12:30,694][61594] Decorrelating experience for 32 frames...
[2023-09-14 13:12:30,697][61690] Decorrelating experience for 32 frames...
[2023-09-14 13:12:30,992][61692] Decorrelating experience for 64 frames...
[2023-09-14 13:12:31,097][61595] Decorrelating experience for 64 frames...
[2023-09-14 13:12:31,114][61592] Decorrelating experience for 64 frames...
[2023-09-14 13:12:31,209][61631] Decorrelating experience for 64 frames...
[2023-09-14 13:12:31,354][61633] Decorrelating experience for 32 frames...
[2023-09-14 13:12:31,429][61692] Decorrelating experience for 96 frames...
[2023-09-14 13:12:31,482][61595] Decorrelating experience for 96 frames...
[2023-09-14 13:12:31,515][61592] Decorrelating experience for 96 frames...
[2023-09-14 13:12:31,655][61631] Decorrelating experience for 96 frames...
[2023-09-14 13:12:31,796][61691] Decorrelating experience for 64 frames...
[2023-09-14 13:12:31,797][61633] Decorrelating experience for 64 frames...
[2023-09-14 13:12:32,157][61691] Decorrelating experience for 96 frames...
[2023-09-14 13:12:32,161][61690] Decorrelating experience for 64 frames...
[2023-09-14 13:12:32,184][61633] Decorrelating experience for 96 frames...
[2023-09-14 13:12:32,525][61594] Decorrelating experience for 64 frames...
[2023-09-14 13:12:32,598][61691] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:32,633][61631] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:32,637][61592] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:32,654][61690] Decorrelating experience for 96 frames...
[2023-09-14 13:12:32,723][61595] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:32,911][61594] Decorrelating experience for 96 frames...
[2023-09-14 13:12:33,057][61690] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:33,200][61633] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:33,330][61692] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:33,432][61308] Signal inference workers to stop experience collection...
[2023-09-14 13:12:33,439][61593] InferenceWorker_p1-w0: stopping experience collection
[2023-09-14 13:12:33,439][61591] InferenceWorker_p0-w0: stopping experience collection
[2023-09-14 13:12:36,388][61308] Signal inference workers to resume experience collection...
[2023-09-14 13:12:36,389][61593] InferenceWorker_p1-w0: resuming experience collection
[2023-09-14 13:12:36,389][61591] InferenceWorker_p0-w0: resuming experience collection
[2023-09-14 13:12:37,538][61425] Signal inference workers to stop experience collection...
[2023-09-14 13:12:37,838][61425] Signal inference workers to resume experience collection...
[2023-09-14 13:12:38,047][61594] Multiple policies in trajectory buffer: [0 1] (-1 means inactive agent)
[2023-09-14 13:12:41,548][61591] Updated weights for policy 0, policy_version 10 (0.0714)
[2023-09-14 13:12:41,690][61593] Updated weights for policy 1, policy_version 10 (0.0011)
[2023-09-14 13:12:46,899][61591] Updated weights for policy 0, policy_version 20 (0.0011)
[2023-09-14 13:12:47,719][61593] Updated weights for policy 1, policy_version 20 (0.0011)
[2023-09-14 13:12:50,659][61425] Saving new best policy, reward=1.568!
[2023-09-14 13:12:50,659][61308] Saving new best policy, reward=1.701!
[2023-09-14 13:12:52,646][61593] Updated weights for policy 1, policy_version 30 (0.0011)
[2023-09-14 13:12:53,384][61591] Updated weights for policy 0, policy_version 30 (0.0011)
[2023-09-14 13:12:55,654][61425] Saving new best policy, reward=1.757!
[2023-09-14 13:12:57,378][61593] Updated weights for policy 1, policy_version 40 (0.0011)
[2023-09-14 13:12:59,848][61591] Updated weights for policy 0, policy_version 40 (0.0010)
[2023-09-14 13:13:00,660][61308] Saving new best policy, reward=1.732!
[2023-09-14 13:13:00,690][61425] Saving new best policy, reward=1.838!
[2023-09-14 13:13:02,946][61593] Updated weights for policy 1, policy_version 50 (0.0011)
[2023-09-14 13:13:05,545][61591] Updated weights for policy 0, policy_version 50 (0.0010)
[2023-09-14 13:13:05,653][61425] Saving new best policy, reward=1.930!
[2023-09-14 13:13:09,533][61593] Updated weights for policy 1, policy_version 60 (0.0010)
[2023-09-14 13:13:10,275][61591] Updated weights for policy 0, policy_version 60 (0.0010)
[2023-09-14 13:13:10,703][61308] Saving new best policy, reward=1.743!
[2023-09-14 13:13:14,763][61591] Updated weights for policy 0, policy_version 70 (0.0010)
[2023-09-14 13:13:15,654][61425] Saving new best policy, reward=2.051!
[2023-09-14 13:13:15,669][61308] Saving new best policy, reward=1.798!
[2023-09-14 13:13:17,287][61593] Updated weights for policy 1, policy_version 70 (0.0010)
[2023-09-14 13:13:19,116][61591] Updated weights for policy 0, policy_version 80 (0.0010)
[2023-09-14 13:13:20,663][61308] Saving new best policy, reward=1.899!
[2023-09-14 13:13:20,663][61425] Saving new best policy, reward=2.150!
[2023-09-14 13:13:23,798][61591] Updated weights for policy 0, policy_version 90 (0.0010)
[2023-09-14 13:13:24,328][61593] Updated weights for policy 1, policy_version 80 (0.0011)
[2023-09-14 13:13:28,416][61591] Updated weights for policy 0, policy_version 100 (0.0010)
[2023-09-14 13:13:32,086][61593] Updated weights for policy 1, policy_version 90 (0.0011)
[2023-09-14 13:13:32,649][61591] Updated weights for policy 0, policy_version 110 (0.0010)
[2023-09-14 13:13:35,653][61308] Saving new best policy, reward=2.141!
[2023-09-14 13:13:36,712][61591] Updated weights for policy 0, policy_version 120 (0.0011)
[2023-09-14 13:13:40,448][61593] Updated weights for policy 1, policy_version 100 (0.0010)
[2023-09-14 13:13:41,184][61591] Updated weights for policy 0, policy_version 130 (0.0011)
[2023-09-14 13:13:46,179][61591] Updated weights for policy 0, policy_version 140 (0.0010)
[2023-09-14 13:13:46,902][61593] Updated weights for policy 1, policy_version 110 (0.0011)
[2023-09-14 13:13:50,659][61308] Saving new best policy, reward=2.312!
[2023-09-14 13:13:51,794][61591] Updated weights for policy 0, policy_version 150 (0.0010)
[2023-09-14 13:13:52,435][61593] Updated weights for policy 1, policy_version 120 (0.0010)
[2023-09-14 13:13:55,655][61308] Saving new best policy, reward=2.555!
[2023-09-14 13:13:55,655][61425] Saving new best policy, reward=2.178!
[2023-09-14 13:13:57,050][61591] Updated weights for policy 0, policy_version 160 (0.0010)
[2023-09-14 13:13:58,050][61593] Updated weights for policy 1, policy_version 130 (0.0010)
[2023-09-14 13:14:00,660][61425] Saving new best policy, reward=2.428!
[2023-09-14 13:14:02,238][61591] Updated weights for policy 0, policy_version 170 (0.0011)
[2023-09-14 13:14:03,847][61593] Updated weights for policy 1, policy_version 140 (0.0010)
[2023-09-14 13:14:05,654][61425] Saving new best policy, reward=2.621!
[2023-09-14 13:14:08,189][61591] Updated weights for policy 0, policy_version 180 (0.0010)
[2023-09-14 13:14:09,071][61593] Updated weights for policy 1, policy_version 150 (0.0010)
[2023-09-14 13:14:10,658][61425] Saving new best policy, reward=2.866!
[2023-09-14 13:14:14,339][61593] Updated weights for policy 1, policy_version 160 (0.0010)
[2023-09-14 13:14:14,664][61591] Updated weights for policy 0, policy_version 190 (0.0010)
[2023-09-14 13:14:19,609][61593] Updated weights for policy 1, policy_version 170 (0.0011)
[2023-09-14 13:14:20,663][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000172_704512.pth...
[2023-09-14 13:14:20,663][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth...
[2023-09-14 13:14:20,721][61308] Saving new best policy, reward=2.611!
[2023-09-14 13:14:21,209][61591] Updated weights for policy 0, policy_version 200 (0.0012)
[2023-09-14 13:14:24,704][61593] Updated weights for policy 1, policy_version 180 (0.0010)
[2023-09-14 13:14:27,244][61591] Updated weights for policy 0, policy_version 210 (0.0011)
[2023-09-14 13:14:30,587][61593] Updated weights for policy 1, policy_version 190 (0.0010)
[2023-09-14 13:14:30,658][61425] Saving new best policy, reward=2.919!
[2023-09-14 13:14:30,658][61308] Saving new best policy, reward=2.711!
[2023-09-14 13:14:32,592][61591] Updated weights for policy 0, policy_version 220 (0.0011)
[2023-09-14 13:14:35,654][61425] Saving new best policy, reward=3.198!
[2023-09-14 13:14:36,583][61593] Updated weights for policy 1, policy_version 200 (0.0011)
[2023-09-14 13:14:38,046][61591] Updated weights for policy 0, policy_version 230 (0.0011)
[2023-09-14 13:14:43,000][61593] Updated weights for policy 1, policy_version 210 (0.0011)
[2023-09-14 13:14:43,022][61591] Updated weights for policy 0, policy_version 240 (0.0011)
[2023-09-14 13:14:45,654][61308] Saving new best policy, reward=3.061!
[2023-09-14 13:14:47,658][61591] Updated weights for policy 0, policy_version 250 (0.0011)
[2023-09-14 13:14:50,182][61593] Updated weights for policy 1, policy_version 220 (0.0011)
[2023-09-14 13:14:52,469][61591] Updated weights for policy 0, policy_version 260 (0.0010)
[2023-09-14 13:14:55,670][61425] Saving new best policy, reward=3.379!
[2023-09-14 13:14:56,099][61593] Updated weights for policy 1, policy_version 230 (0.0010)
[2023-09-14 13:14:58,583][61591] Updated weights for policy 0, policy_version 270 (0.0010)
[2023-09-14 13:15:00,660][61425] Saving new best policy, reward=3.570!
[2023-09-14 13:15:01,259][61593] Updated weights for policy 1, policy_version 240 (0.0011)
[2023-09-14 13:15:05,681][61591] Updated weights for policy 0, policy_version 280 (0.0010)
[2023-09-14 13:15:05,917][61593] Updated weights for policy 1, policy_version 250 (0.0010)
[2023-09-14 13:15:10,398][61593] Updated weights for policy 1, policy_version 260 (0.0011)
[2023-09-14 13:15:12,949][61591] Updated weights for policy 0, policy_version 290 (0.0010)
[2023-09-14 13:15:15,007][61593] Updated weights for policy 1, policy_version 270 (0.0011)
[2023-09-14 13:15:18,602][61591] Updated weights for policy 0, policy_version 300 (0.0011)
[2023-09-14 13:15:20,342][61593] Updated weights for policy 1, policy_version 280 (0.0011)
[2023-09-14 13:15:20,658][61308] Saving new best policy, reward=3.073!
[2023-09-14 13:15:24,268][61591] Updated weights for policy 0, policy_version 310 (0.0010)
[2023-09-14 13:15:25,564][61593] Updated weights for policy 1, policy_version 290 (0.0010)
[2023-09-14 13:15:25,653][61308] Saving new best policy, reward=3.279!
[2023-09-14 13:15:30,056][61591] Updated weights for policy 0, policy_version 320 (0.0011)
[2023-09-14 13:15:30,660][61308] Saving new best policy, reward=3.495!
[2023-09-14 13:15:31,195][61593] Updated weights for policy 1, policy_version 300 (0.0011)
[2023-09-14 13:15:35,654][61308] Saving new best policy, reward=3.774!
[2023-09-14 13:15:35,861][61591] Updated weights for policy 0, policy_version 330 (0.0011)
[2023-09-14 13:15:36,802][61593] Updated weights for policy 1, policy_version 310 (0.0010)
[2023-09-14 13:15:40,660][61308] Saving new best policy, reward=3.980!
[2023-09-14 13:15:41,389][61591] Updated weights for policy 0, policy_version 340 (0.0010)
[2023-09-14 13:15:42,806][61593] Updated weights for policy 1, policy_version 320 (0.0011)
[2023-09-14 13:15:45,653][61308] Saving new best policy, reward=4.191!
[2023-09-14 13:15:47,257][61591] Updated weights for policy 0, policy_version 350 (0.0012)
[2023-09-14 13:15:48,454][61593] Updated weights for policy 1, policy_version 330 (0.0011)
[2023-09-14 13:15:50,660][61308] Saving new best policy, reward=4.391!
[2023-09-14 13:15:52,549][61591] Updated weights for policy 0, policy_version 360 (0.0011)
[2023-09-14 13:15:55,054][61593] Updated weights for policy 1, policy_version 340 (0.0011)
[2023-09-14 13:15:55,654][61308] Saving new best policy, reward=4.615!
[2023-09-14 13:15:57,230][61591] Updated weights for policy 0, policy_version 370 (0.0011)
[2023-09-14 13:16:00,660][61308] Saving new best policy, reward=5.092!
[2023-09-14 13:16:02,084][61591] Updated weights for policy 0, policy_version 380 (0.0011)
[2023-09-14 13:16:02,180][61593] Updated weights for policy 1, policy_version 350 (0.0011)
[2023-09-14 13:16:06,513][61591] Updated weights for policy 0, policy_version 390 (0.0010)
[2023-09-14 13:16:10,230][61593] Updated weights for policy 1, policy_version 360 (0.0011)
[2023-09-14 13:16:10,905][61591] Updated weights for policy 0, policy_version 400 (0.0011)
[2023-09-14 13:16:15,609][61591] Updated weights for policy 0, policy_version 410 (0.0010)
[2023-09-14 13:16:17,646][61593] Updated weights for policy 1, policy_version 370 (0.0010)
[2023-09-14 13:16:20,200][61591] Updated weights for policy 0, policy_version 420 (0.0011)
[2023-09-14 13:16:20,661][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000374_1531904.pth...
[2023-09-14 13:16:20,693][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000421_1724416.pth...
[2023-09-14 13:16:24,669][61593] Updated weights for policy 1, policy_version 380 (0.0011)
[2023-09-14 13:16:25,074][61591] Updated weights for policy 0, policy_version 430 (0.0011)
[2023-09-14 13:16:25,654][61308] Saving new best policy, reward=5.576!
[2023-09-14 13:16:30,349][61591] Updated weights for policy 0, policy_version 440 (0.0010)
[2023-09-14 13:16:30,659][61308] Saving new best policy, reward=5.753!
[2023-09-14 13:16:31,166][61593] Updated weights for policy 1, policy_version 390 (0.0011)
[2023-09-14 13:16:36,169][61591] Updated weights for policy 0, policy_version 450 (0.0011)
[2023-09-14 13:16:36,971][61593] Updated weights for policy 1, policy_version 400 (0.0011)
[2023-09-14 13:16:40,659][61308] Saving new best policy, reward=6.160!
[2023-09-14 13:16:41,936][61591] Updated weights for policy 0, policy_version 460 (0.0011)
[2023-09-14 13:16:42,816][61593] Updated weights for policy 1, policy_version 410 (0.0011)
[2023-09-14 13:16:45,654][61308] Saving new best policy, reward=6.381!
[2023-09-14 13:16:45,654][61425] Saving new best policy, reward=3.883!
[2023-09-14 13:16:47,009][61591] Updated weights for policy 0, policy_version 470 (0.0010)
[2023-09-14 13:16:48,873][61593] Updated weights for policy 1, policy_version 420 (0.0010)
[2023-09-14 13:16:50,662][61308] Saving new best policy, reward=6.694!
[2023-09-14 13:16:52,287][61591] Updated weights for policy 0, policy_version 480 (0.0011)
[2023-09-14 13:16:54,610][61593] Updated weights for policy 1, policy_version 430 (0.0010)
[2023-09-14 13:16:55,654][61308] Saving new best policy, reward=6.700!
[2023-09-14 13:16:55,716][61425] Saving new best policy, reward=4.000!
[2023-09-14 13:16:58,316][61591] Updated weights for policy 0, policy_version 490 (0.0011)
[2023-09-14 13:17:00,146][61593] Updated weights for policy 1, policy_version 440 (0.0010)
[2023-09-14 13:17:00,682][61425] Saving new best policy, reward=4.312!
[2023-09-14 13:17:03,911][61591] Updated weights for policy 0, policy_version 500 (0.0010)
[2023-09-14 13:17:05,655][61308] Saving new best policy, reward=7.021!
[2023-09-14 13:17:05,655][61425] Saving new best policy, reward=4.591!
[2023-09-14 13:17:06,452][61593] Updated weights for policy 1, policy_version 450 (0.0011)
[2023-09-14 13:17:08,849][61591] Updated weights for policy 0, policy_version 510 (0.0010)
[2023-09-14 13:17:10,662][61308] Saving new best policy, reward=7.294!
[2023-09-14 13:17:10,662][61425] Saving new best policy, reward=5.094!
[2023-09-14 13:17:13,196][61593] Updated weights for policy 1, policy_version 460 (0.0010)
[2023-09-14 13:17:14,059][61591] Updated weights for policy 0, policy_version 520 (0.0010)
[2023-09-14 13:17:15,654][61308] Saving new best policy, reward=7.511!
[2023-09-14 13:17:19,215][61591] Updated weights for policy 0, policy_version 530 (0.0011)
[2023-09-14 13:17:19,835][61593] Updated weights for policy 1, policy_version 470 (0.0011)
[2023-09-14 13:17:20,665][61425] Saving new best policy, reward=5.277!
[2023-09-14 13:17:20,668][61308] Saving new best policy, reward=7.600!
[2023-09-14 13:17:23,995][61591] Updated weights for policy 0, policy_version 540 (0.0011)
[2023-09-14 13:17:25,654][61308] Saving new best policy, reward=8.067!
[2023-09-14 13:17:25,654][61425] Saving new best policy, reward=5.683!
[2023-09-14 13:17:26,722][61593] Updated weights for policy 1, policy_version 480 (0.0010)
[2023-09-14 13:17:28,985][61591] Updated weights for policy 0, policy_version 550 (0.0011)
[2023-09-14 13:17:30,658][61308] Saving new best policy, reward=8.294!
[2023-09-14 13:17:32,676][61593] Updated weights for policy 1, policy_version 490 (0.0010)
[2023-09-14 13:17:34,887][61591] Updated weights for policy 0, policy_version 560 (0.0011)
[2023-09-14 13:17:35,664][61308] Saving new best policy, reward=8.372!
[2023-09-14 13:17:35,664][61425] Saving new best policy, reward=6.140!
[2023-09-14 13:17:38,260][61593] Updated weights for policy 1, policy_version 500 (0.0011)
[2023-09-14 13:17:40,660][61425] Saving new best policy, reward=6.446!
[2023-09-14 13:17:41,374][61591] Updated weights for policy 0, policy_version 570 (0.0010)
[2023-09-14 13:17:43,239][61593] Updated weights for policy 1, policy_version 510 (0.0010)
[2023-09-14 13:17:45,653][61425] Saving new best policy, reward=7.121!
[2023-09-14 13:17:48,130][61593] Updated weights for policy 1, policy_version 520 (0.0011)
[2023-09-14 13:17:48,412][61591] Updated weights for policy 0, policy_version 580 (0.0011)
[2023-09-14 13:17:50,659][61425] Saving new best policy, reward=7.850!
[2023-09-14 13:17:52,934][61593] Updated weights for policy 1, policy_version 530 (0.0010)
[2023-09-14 13:17:55,072][61591] Updated weights for policy 0, policy_version 590 (0.0009)
[2023-09-14 13:17:55,654][61425] Saving new best policy, reward=8.746!
[2023-09-14 13:17:55,716][61308] Saving new best policy, reward=8.510!
[2023-09-14 13:17:58,055][61593] Updated weights for policy 1, policy_version 540 (0.0011)
[2023-09-14 13:18:00,661][61308] Saving new best policy, reward=8.864!
[2023-09-14 13:18:00,662][61425] Saving new best policy, reward=9.346!
[2023-09-14 13:18:01,731][61591] Updated weights for policy 0, policy_version 600 (0.0010)
[2023-09-14 13:18:03,215][61593] Updated weights for policy 1, policy_version 550 (0.0010)
[2023-09-14 13:18:05,653][61308] Saving new best policy, reward=9.335!
[2023-09-14 13:18:05,702][61425] Saving new best policy, reward=9.634!
[2023-09-14 13:18:08,131][61593] Updated weights for policy 1, policy_version 560 (0.0010)
[2023-09-14 13:18:08,670][61591] Updated weights for policy 0, policy_version 610 (0.0012)
[2023-09-14 13:18:10,664][61425] Saving new best policy, reward=10.724!
[2023-09-14 13:18:13,272][61593] Updated weights for policy 1, policy_version 570 (0.0011)
[2023-09-14 13:18:15,365][61591] Updated weights for policy 0, policy_version 620 (0.0011)
[2023-09-14 13:18:15,653][61425] Saving new best policy, reward=12.012!
[2023-09-14 13:18:19,125][61593] Updated weights for policy 1, policy_version 580 (0.0011)
[2023-09-14 13:18:20,661][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000582_2383872.pth...
[2023-09-14 13:18:20,661][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth...
[2023-09-14 13:18:20,717][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth
[2023-09-14 13:18:20,721][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000172_704512.pth
[2023-09-14 13:18:20,723][61308] Saving new best policy, reward=9.405!
[2023-09-14 13:18:20,728][61425] Saving new best policy, reward=12.716!
[2023-09-14 13:18:21,117][61591] Updated weights for policy 0, policy_version 630 (0.0011)
[2023-09-14 13:18:24,823][61593] Updated weights for policy 1, policy_version 590 (0.0010)
[2023-09-14 13:18:25,654][61308] Saving new best policy, reward=9.660!
[2023-09-14 13:18:25,654][61425] Saving new best policy, reward=12.968!
[2023-09-14 13:18:26,988][61591] Updated weights for policy 0, policy_version 640 (0.0010)
[2023-09-14 13:18:30,464][61593] Updated weights for policy 1, policy_version 600 (0.0010)
[2023-09-14 13:18:30,657][61308] Saving new best policy, reward=10.299!
[2023-09-14 13:18:30,658][61425] Saving new best policy, reward=13.610!
[2023-09-14 13:18:32,617][61591] Updated weights for policy 0, policy_version 650 (0.0012)
[2023-09-14 13:18:35,654][61308] Saving new best policy, reward=11.219!
[2023-09-14 13:18:35,654][61425] Saving new best policy, reward=14.262!
[2023-09-14 13:18:36,091][61593] Updated weights for policy 1, policy_version 610 (0.0010)
[2023-09-14 13:18:38,343][61591] Updated weights for policy 0, policy_version 660 (0.0011)
[2023-09-14 13:18:40,657][61308] Saving new best policy, reward=11.598!
[2023-09-14 13:18:40,657][61425] Saving new best policy, reward=14.300!
[2023-09-14 13:18:42,239][61593] Updated weights for policy 1, policy_version 620 (0.0010)
[2023-09-14 13:18:43,877][61591] Updated weights for policy 0, policy_version 670 (0.0010)
[2023-09-14 13:18:45,654][61308] Saving new best policy, reward=11.856!
[2023-09-14 13:18:45,654][61425] Saving new best policy, reward=14.623!
[2023-09-14 13:18:48,638][61593] Updated weights for policy 1, policy_version 630 (0.0011)
[2023-09-14 13:18:49,122][61591] Updated weights for policy 0, policy_version 680 (0.0010)
[2023-09-14 13:18:50,662][61308] Saving new best policy, reward=12.551!
[2023-09-14 13:18:50,662][61425] Saving new best policy, reward=14.814!
[2023-09-14 13:18:53,980][61591] Updated weights for policy 0, policy_version 690 (0.0010)
[2023-09-14 13:18:55,654][61308] Saving new best policy, reward=13.303!
[2023-09-14 13:18:55,654][61425] Saving new best policy, reward=15.427!
[2023-09-14 13:18:55,820][61593] Updated weights for policy 1, policy_version 640 (0.0012)
[2023-09-14 13:18:58,720][61591] Updated weights for policy 0, policy_version 700 (0.0011)
[2023-09-14 13:19:00,659][61308] Saving new best policy, reward=13.366!
[2023-09-14 13:19:00,660][61425] Saving new best policy, reward=15.930!
[2023-09-14 13:19:02,515][61593] Updated weights for policy 1, policy_version 650 (0.0010)
[2023-09-14 13:19:03,718][61591] Updated weights for policy 0, policy_version 710 (0.0011)
[2023-09-14 13:19:05,654][61425] Saving new best policy, reward=16.204!
[2023-09-14 13:19:05,666][61308] Saving new best policy, reward=13.500!
[2023-09-14 13:19:08,298][61591] Updated weights for policy 0, policy_version 720 (0.0011)
[2023-09-14 13:19:09,792][61593] Updated weights for policy 1, policy_version 660 (0.0010)
[2023-09-14 13:19:10,662][61425] Saving new best policy, reward=17.008!
[2023-09-14 13:19:10,662][61308] Saving new best policy, reward=13.680!
[2023-09-14 13:19:12,359][61591] Updated weights for policy 0, policy_version 730 (0.0011)
[2023-09-14 13:19:15,653][61425] Saving new best policy, reward=17.272!
[2023-09-14 13:19:15,666][61308] Saving new best policy, reward=13.863!
[2023-09-14 13:19:16,459][61591] Updated weights for policy 0, policy_version 740 (0.0010)
[2023-09-14 13:19:18,854][61593] Updated weights for policy 1, policy_version 670 (0.0010)
[2023-09-14 13:19:20,325][61591] Updated weights for policy 0, policy_version 750 (0.0010)
[2023-09-14 13:19:20,662][61425] Saving new best policy, reward=17.440!
[2023-09-14 13:19:20,701][61308] Saving new best policy, reward=14.222!
[2023-09-14 13:19:24,296][61591] Updated weights for policy 0, policy_version 760 (0.0010)
[2023-09-14 13:19:25,654][61308] Saving new best policy, reward=14.326!
[2023-09-14 13:19:25,654][61425] Saving new best policy, reward=17.873!
[2023-09-14 13:19:28,988][61593] Updated weights for policy 1, policy_version 680 (0.0010)
[2023-09-14 13:19:29,619][61591] Updated weights for policy 0, policy_version 770 (0.0015)
[2023-09-14 13:19:34,722][61591] Updated weights for policy 0, policy_version 780 (0.0013)
[2023-09-14 13:19:36,307][61593] Updated weights for policy 1, policy_version 690 (0.0011)
[2023-09-14 13:19:39,420][61591] Updated weights for policy 0, policy_version 790 (0.0011)
[2023-09-14 13:19:43,645][61593] Updated weights for policy 1, policy_version 700 (0.0011)
[2023-09-14 13:19:43,925][61591] Updated weights for policy 0, policy_version 800 (0.0011)
[2023-09-14 13:19:48,991][61591] Updated weights for policy 0, policy_version 810 (0.0011)
[2023-09-14 13:19:50,038][61593] Updated weights for policy 1, policy_version 710 (0.0010)
[2023-09-14 13:19:50,659][61308] Saving new best policy, reward=14.678!
[2023-09-14 13:19:54,688][61591] Updated weights for policy 0, policy_version 820 (0.0010)
[2023-09-14 13:19:55,654][61308] Saving new best policy, reward=15.735!
[2023-09-14 13:19:55,768][61593] Updated weights for policy 1, policy_version 720 (0.0011)
[2023-09-14 13:20:00,651][61591] Updated weights for policy 0, policy_version 830 (0.0010)
[2023-09-14 13:20:00,662][61308] Saving new best policy, reward=16.261!
[2023-09-14 13:20:00,907][61593] Updated weights for policy 1, policy_version 730 (0.0010)
[2023-09-14 13:20:05,951][61593] Updated weights for policy 1, policy_version 740 (0.0011)
[2023-09-14 13:20:07,070][61591] Updated weights for policy 0, policy_version 840 (0.0010)
[2023-09-14 13:20:11,130][61593] Updated weights for policy 1, policy_version 750 (0.0010)
[2023-09-14 13:20:13,158][61591] Updated weights for policy 0, policy_version 850 (0.0011)
[2023-09-14 13:20:16,387][61593] Updated weights for policy 1, policy_version 760 (0.0010)
[2023-09-14 13:20:18,781][61591] Updated weights for policy 0, policy_version 860 (0.0011)
[2023-09-14 13:20:20,658][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000767_3141632.pth...
[2023-09-14 13:20:20,659][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000863_3534848.pth...
[2023-09-14 13:20:20,713][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000374_1531904.pth
[2023-09-14 13:20:20,728][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000421_1724416.pth
[2023-09-14 13:20:22,380][61593] Updated weights for policy 1, policy_version 770 (0.0010)
[2023-09-14 13:20:23,699][61591] Updated weights for policy 0, policy_version 870 (0.0011)
[2023-09-14 13:20:25,653][61308] Saving new best policy, reward=17.233!
[2023-09-14 13:20:28,445][61591] Updated weights for policy 0, policy_version 880 (0.0010)
[2023-09-14 13:20:29,355][61593] Updated weights for policy 1, policy_version 780 (0.0011)
[2023-09-14 13:20:30,657][61308] Saving new best policy, reward=18.610!
[2023-09-14 13:20:33,201][61591] Updated weights for policy 0, policy_version 890 (0.0011)
[2023-09-14 13:20:35,654][61308] Saving new best policy, reward=19.357!
[2023-09-14 13:20:36,860][61593] Updated weights for policy 1, policy_version 790 (0.0011)
[2023-09-14 13:20:37,928][61591] Updated weights for policy 0, policy_version 900 (0.0011)
[2023-09-14 13:20:40,660][61308] Saving new best policy, reward=19.468!
[2023-09-14 13:20:42,807][61591] Updated weights for policy 0, policy_version 910 (0.0011)
[2023-09-14 13:20:43,964][61593] Updated weights for policy 1, policy_version 800 (0.0010)
[2023-09-14 13:20:47,899][61591] Updated weights for policy 0, policy_version 920 (0.0011)
[2023-09-14 13:20:50,450][61593] Updated weights for policy 1, policy_version 810 (0.0011)
[2023-09-14 13:20:52,611][61591] Updated weights for policy 0, policy_version 930 (0.0011)
[2023-09-14 13:20:57,412][61591] Updated weights for policy 0, policy_version 940 (0.0012)
[2023-09-14 13:20:57,788][61593] Updated weights for policy 1, policy_version 820 (0.0011)
[2023-09-14 13:21:02,593][61591] Updated weights for policy 0, policy_version 950 (0.0011)
[2023-09-14 13:21:04,265][61593] Updated weights for policy 1, policy_version 830 (0.0011)
[2023-09-14 13:21:08,156][61591] Updated weights for policy 0, policy_version 960 (0.0010)
[2023-09-14 13:21:09,976][61593] Updated weights for policy 1, policy_version 840 (0.0011)
[2023-09-14 13:21:13,913][61591] Updated weights for policy 0, policy_version 970 (0.0010)
[2023-09-14 13:21:15,822][61593] Updated weights for policy 1, policy_version 850 (0.0010)
[2023-09-14 13:21:19,727][61591] Updated weights for policy 0, policy_version 980 (0.0011)
[2023-09-14 13:21:21,630][61593] Updated weights for policy 1, policy_version 860 (0.0010)
[2023-09-14 13:21:25,479][61591] Updated weights for policy 0, policy_version 990 (0.0011)
[2023-09-14 13:21:27,345][61593] Updated weights for policy 1, policy_version 870 (0.0011)
[2023-09-14 13:21:30,657][61425] Saving new best policy, reward=18.052!
[2023-09-14 13:21:31,721][61591] Updated weights for policy 0, policy_version 1000 (0.0009)
[2023-09-14 13:21:32,837][61593] Updated weights for policy 1, policy_version 880 (0.0010)
[2023-09-14 13:21:35,654][61425] Saving new best policy, reward=18.405!
[2023-09-14 13:21:37,864][61593] Updated weights for policy 1, policy_version 890 (0.0010)
[2023-09-14 13:21:38,755][61591] Updated weights for policy 0, policy_version 1010 (0.0010)
[2023-09-14 13:21:40,661][61425] Saving new best policy, reward=18.650!
[2023-09-14 13:21:42,950][61593] Updated weights for policy 1, policy_version 900 (0.0011)
[2023-09-14 13:21:45,587][61591] Updated weights for policy 0, policy_version 1020 (0.0010)
[2023-09-14 13:21:45,654][61425] Saving new best policy, reward=18.817!
[2023-09-14 13:21:47,607][61593] Updated weights for policy 1, policy_version 910 (0.0011)
[2023-09-14 13:21:52,238][61593] Updated weights for policy 1, policy_version 920 (0.0011)
[2023-09-14 13:21:52,472][61591] Updated weights for policy 0, policy_version 1030 (0.0011)
[2023-09-14 13:21:55,654][61425] Saving new best policy, reward=19.190!
[2023-09-14 13:21:57,225][61593] Updated weights for policy 1, policy_version 930 (0.0011)
[2023-09-14 13:21:58,707][61591] Updated weights for policy 0, policy_version 1040 (0.0010)
[2023-09-14 13:22:02,120][61593] Updated weights for policy 1, policy_version 940 (0.0010)
[2023-09-14 13:22:04,697][61591] Updated weights for policy 0, policy_version 1050 (0.0011)
[2023-09-14 13:22:07,247][61593] Updated weights for policy 1, policy_version 950 (0.0010)
[2023-09-14 13:22:10,609][61591] Updated weights for policy 0, policy_version 1060 (0.0011)
[2023-09-14 13:22:10,660][61425] Saving new best policy, reward=19.201!
[2023-09-14 13:22:12,493][61593] Updated weights for policy 1, policy_version 960 (0.0011)
[2023-09-14 13:22:15,654][61425] Saving new best policy, reward=19.511!
[2023-09-14 13:22:16,223][61591] Updated weights for policy 0, policy_version 1070 (0.0011)
[2023-09-14 13:22:17,859][61593] Updated weights for policy 1, policy_version 970 (0.0010)
[2023-09-14 13:22:20,662][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001077_4411392.pth...
[2023-09-14 13:22:20,662][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000975_3993600.pth...
[2023-09-14 13:22:20,715][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth
[2023-09-14 13:22:20,724][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000582_2383872.pth
[2023-09-14 13:22:20,731][61425] Saving new best policy, reward=19.571!
[2023-09-14 13:22:22,566][61591] Updated weights for policy 0, policy_version 1080 (0.0011)
[2023-09-14 13:22:22,858][61593] Updated weights for policy 1, policy_version 980 (0.0010)
[2023-09-14 13:22:25,654][61425] Saving new best policy, reward=19.889!
[2023-09-14 13:22:27,946][61593] Updated weights for policy 1, policy_version 990 (0.0010)
[2023-09-14 13:22:28,832][61591] Updated weights for policy 0, policy_version 1090 (0.0010)
[2023-09-14 13:22:30,660][61425] Saving new best policy, reward=20.649!
[2023-09-14 13:22:32,614][61593] Updated weights for policy 1, policy_version 1000 (0.0010)
[2023-09-14 13:22:35,855][61591] Updated weights for policy 0, policy_version 1100 (0.0010)
[2023-09-14 13:22:37,566][61593] Updated weights for policy 1, policy_version 1010 (0.0010)
[2023-09-14 13:22:42,443][61593] Updated weights for policy 1, policy_version 1020 (0.0010)
[2023-09-14 13:22:42,596][61591] Updated weights for policy 0, policy_version 1110 (0.0010)
[2023-09-14 13:22:46,917][61593] Updated weights for policy 1, policy_version 1030 (0.0010)
[2023-09-14 13:22:49,888][61591] Updated weights for policy 0, policy_version 1120 (0.0010)
[2023-09-14 13:22:51,463][61593] Updated weights for policy 1, policy_version 1040 (0.0011)
[2023-09-14 13:22:55,915][61593] Updated weights for policy 1, policy_version 1050 (0.0011)
[2023-09-14 13:22:57,379][61591] Updated weights for policy 0, policy_version 1130 (0.0011)
[2023-09-14 13:23:00,430][61593] Updated weights for policy 1, policy_version 1060 (0.0011)
[2023-09-14 13:23:04,487][61591] Updated weights for policy 0, policy_version 1140 (0.0012)
[2023-09-14 13:23:05,284][61593] Updated weights for policy 1, policy_version 1070 (0.0011)
[2023-09-14 13:23:10,472][61593] Updated weights for policy 1, policy_version 1080 (0.0011)
[2023-09-14 13:23:10,608][61591] Updated weights for policy 0, policy_version 1150 (0.0010)
[2023-09-14 13:23:16,047][61593] Updated weights for policy 1, policy_version 1090 (0.0011)
[2023-09-14 13:23:16,720][61591] Updated weights for policy 0, policy_version 1160 (0.0012)
[2023-09-14 13:23:21,064][61593] Updated weights for policy 1, policy_version 1100 (0.0011)
[2023-09-14 13:23:23,134][61591] Updated weights for policy 0, policy_version 1170 (0.0011)
[2023-09-14 13:23:26,089][61593] Updated weights for policy 1, policy_version 1110 (0.0011)
[2023-09-14 13:23:30,489][61593] Updated weights for policy 1, policy_version 1120 (0.0011)
[2023-09-14 13:23:30,555][61591] Updated weights for policy 0, policy_version 1180 (0.0010)
[2023-09-14 13:23:34,829][61593] Updated weights for policy 1, policy_version 1130 (0.0011)
[2023-09-14 13:23:35,699][61425] Saving new best policy, reward=20.760!
[2023-09-14 13:23:39,002][61591] Updated weights for policy 0, policy_version 1190 (0.0010)
[2023-09-14 13:23:39,058][61593] Updated weights for policy 1, policy_version 1140 (0.0011)
[2023-09-14 13:23:40,708][61425] Saving new best policy, reward=21.056!
[2023-09-14 13:23:43,287][61593] Updated weights for policy 1, policy_version 1150 (0.0010)
[2023-09-14 13:23:45,654][61425] Saving new best policy, reward=21.360!
[2023-09-14 13:23:46,803][61591] Updated weights for policy 0, policy_version 1200 (0.0010)
[2023-09-14 13:23:47,617][61593] Updated weights for policy 1, policy_version 1160 (0.0011)
[2023-09-14 13:23:50,659][61425] Saving new best policy, reward=22.059!
[2023-09-14 13:23:51,754][61593] Updated weights for policy 1, policy_version 1170 (0.0011)
[2023-09-14 13:23:55,654][61425] Saving new best policy, reward=22.473!
[2023-09-14 13:23:55,796][61593] Updated weights for policy 1, policy_version 1180 (0.0011)
[2023-09-14 13:23:56,443][61591] Updated weights for policy 0, policy_version 1210 (0.0010)
[2023-09-14 13:24:00,295][61593] Updated weights for policy 1, policy_version 1190 (0.0010)
[2023-09-14 13:24:04,348][61591] Updated weights for policy 0, policy_version 1220 (0.0010)
[2023-09-14 13:24:04,912][61593] Updated weights for policy 1, policy_version 1200 (0.0010)
[2023-09-14 13:24:09,704][61593] Updated weights for policy 1, policy_version 1210 (0.0011)
[2023-09-14 13:24:11,222][61591] Updated weights for policy 0, policy_version 1230 (0.0011)
[2023-09-14 13:24:15,324][61593] Updated weights for policy 1, policy_version 1220 (0.0010)
[2023-09-14 13:24:15,654][61425] Saving new best policy, reward=22.847!
[2023-09-14 13:24:17,142][61591] Updated weights for policy 0, policy_version 1240 (0.0011)
[2023-09-14 13:24:20,660][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001245_5099520.pth...
[2023-09-14 13:24:20,660][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001229_5033984.pth...
[2023-09-14 13:24:20,718][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000000863_3534848.pth
[2023-09-14 13:24:20,728][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000767_3141632.pth
[2023-09-14 13:24:21,129][61593] Updated weights for policy 1, policy_version 1230 (0.0011)
[2023-09-14 13:24:23,300][61591] Updated weights for policy 0, policy_version 1250 (0.0010)
[2023-09-14 13:24:26,591][61593] Updated weights for policy 1, policy_version 1240 (0.0010)
[2023-09-14 13:24:29,893][61591] Updated weights for policy 0, policy_version 1260 (0.0010)
[2023-09-14 13:24:31,704][61593] Updated weights for policy 1, policy_version 1250 (0.0011)
[2023-09-14 13:24:36,916][61593] Updated weights for policy 1, policy_version 1260 (0.0010)
[2023-09-14 13:24:37,916][61591] Updated weights for policy 0, policy_version 1270 (0.0012)
[2023-09-14 13:24:42,956][61593] Updated weights for policy 1, policy_version 1270 (0.0010)
[2023-09-14 13:24:45,846][61591] Updated weights for policy 0, policy_version 1280 (0.0013)
[2023-09-14 13:24:48,308][61593] Updated weights for policy 1, policy_version 1280 (0.0010)
[2023-09-14 13:24:52,584][61591] Updated weights for policy 0, policy_version 1290 (0.0011)
[2023-09-14 13:24:53,429][61593] Updated weights for policy 1, policy_version 1290 (0.0011)
[2023-09-14 13:24:58,049][61593] Updated weights for policy 1, policy_version 1300 (0.0010)
[2023-09-14 13:24:59,376][61591] Updated weights for policy 0, policy_version 1300 (0.0009)
[2023-09-14 13:25:02,699][61593] Updated weights for policy 1, policy_version 1310 (0.0010)
[2023-09-14 13:25:06,286][61591] Updated weights for policy 0, policy_version 1310 (0.0011)
[2023-09-14 13:25:07,287][61593] Updated weights for policy 1, policy_version 1320 (0.0010)
[2023-09-14 13:25:11,962][61593] Updated weights for policy 1, policy_version 1330 (0.0010)
[2023-09-14 13:25:12,869][61591] Updated weights for policy 0, policy_version 1320 (0.0011)
[2023-09-14 13:25:17,496][61593] Updated weights for policy 1, policy_version 1340 (0.0011)
[2023-09-14 13:25:19,306][61591] Updated weights for policy 0, policy_version 1330 (0.0012)
[2023-09-14 13:25:22,954][61593] Updated weights for policy 1, policy_version 1350 (0.0011)
[2023-09-14 13:25:25,772][61591] Updated weights for policy 0, policy_version 1340 (0.0012)
[2023-09-14 13:25:29,114][61593] Updated weights for policy 1, policy_version 1360 (0.0011)
[2023-09-14 13:25:32,212][61591] Updated weights for policy 0, policy_version 1350 (0.0012)
[2023-09-14 13:25:35,754][61593] Updated weights for policy 1, policy_version 1370 (0.0012)
[2023-09-14 13:25:37,776][61591] Updated weights for policy 0, policy_version 1360 (0.0011)
[2023-09-14 13:25:42,527][61591] Updated weights for policy 0, policy_version 1370 (0.0011)
[2023-09-14 13:25:42,707][61593] Updated weights for policy 1, policy_version 1380 (0.0011)
[2023-09-14 13:25:47,653][61591] Updated weights for policy 0, policy_version 1380 (0.0012)
[2023-09-14 13:25:50,142][61593] Updated weights for policy 1, policy_version 1390 (0.0011)
[2023-09-14 13:25:52,856][61591] Updated weights for policy 0, policy_version 1390 (0.0011)
[2023-09-14 13:25:56,478][61593] Updated weights for policy 1, policy_version 1400 (0.0010)
[2023-09-14 13:25:58,381][61591] Updated weights for policy 0, policy_version 1400 (0.0010)
[2023-09-14 13:26:02,706][61593] Updated weights for policy 1, policy_version 1410 (0.0010)
[2023-09-14 13:26:03,814][61591] Updated weights for policy 0, policy_version 1410 (0.0010)
[2023-09-14 13:26:08,058][61593] Updated weights for policy 1, policy_version 1420 (0.0011)
[2023-09-14 13:26:10,007][61591] Updated weights for policy 0, policy_version 1420 (0.0010)
[2023-09-14 13:26:10,684][61308] Saving new best policy, reward=19.511!
[2023-09-14 13:26:13,046][61593] Updated weights for policy 1, policy_version 1430 (0.0010)
[2023-09-14 13:26:16,463][61591] Updated weights for policy 0, policy_version 1430 (0.0010)
[2023-09-14 13:26:18,211][61593] Updated weights for policy 1, policy_version 1440 (0.0010)
[2023-09-14 13:26:20,661][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001436_5881856.pth...
[2023-09-14 13:26:20,708][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001445_5918720.pth...
[2023-09-14 13:26:20,713][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001077_4411392.pth
[2023-09-14 13:26:20,765][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000000975_3993600.pth
[2023-09-14 13:26:22,761][61591] Updated weights for policy 0, policy_version 1440 (0.0010)
[2023-09-14 13:26:23,170][61593] Updated weights for policy 1, policy_version 1450 (0.0011)
[2023-09-14 13:26:28,220][61593] Updated weights for policy 1, policy_version 1460 (0.0011)
[2023-09-14 13:26:29,317][61591] Updated weights for policy 0, policy_version 1450 (0.0010)
[2023-09-14 13:26:30,657][61308] Saving new best policy, reward=19.954!
[2023-09-14 13:26:32,763][61593] Updated weights for policy 1, policy_version 1470 (0.0011)
[2023-09-14 13:26:36,401][61591] Updated weights for policy 0, policy_version 1460 (0.0011)
[2023-09-14 13:26:37,379][61593] Updated weights for policy 1, policy_version 1480 (0.0011)
[2023-09-14 13:26:41,898][61593] Updated weights for policy 1, policy_version 1490 (0.0011)
[2023-09-14 13:26:43,703][61591] Updated weights for policy 0, policy_version 1470 (0.0010)
[2023-09-14 13:26:46,776][61593] Updated weights for policy 1, policy_version 1500 (0.0011)
[2023-09-14 13:26:50,354][61591] Updated weights for policy 0, policy_version 1480 (0.0011)
[2023-09-14 13:26:51,424][61593] Updated weights for policy 1, policy_version 1510 (0.0010)
[2023-09-14 13:26:55,654][61308] Saving new best policy, reward=20.021!
[2023-09-14 13:26:56,105][61593] Updated weights for policy 1, policy_version 1520 (0.0011)
[2023-09-14 13:26:57,336][61591] Updated weights for policy 0, policy_version 1490 (0.0010)
[2023-09-14 13:27:00,992][61593] Updated weights for policy 1, policy_version 1530 (0.0012)
[2023-09-14 13:27:03,942][61591] Updated weights for policy 0, policy_version 1500 (0.0010)
[2023-09-14 13:27:06,071][61593] Updated weights for policy 1, policy_version 1540 (0.0010)
[2023-09-14 13:27:09,850][61591] Updated weights for policy 0, policy_version 1510 (0.0010)
[2023-09-14 13:27:11,711][61593] Updated weights for policy 1, policy_version 1550 (0.0011)
[2023-09-14 13:27:15,548][61591] Updated weights for policy 0, policy_version 1520 (0.0010)
[2023-09-14 13:27:17,125][61593] Updated weights for policy 1, policy_version 1560 (0.0010)
[2023-09-14 13:27:21,695][61591] Updated weights for policy 0, policy_version 1530 (0.0010)
[2023-09-14 13:27:22,524][61593] Updated weights for policy 1, policy_version 1570 (0.0011)
[2023-09-14 13:27:27,567][61591] Updated weights for policy 0, policy_version 1540 (0.0010)
[2023-09-14 13:27:28,327][61593] Updated weights for policy 1, policy_version 1580 (0.0011)
[2023-09-14 13:27:30,658][61425] Saving new best policy, reward=22.982!
[2023-09-14 13:27:33,381][61591] Updated weights for policy 0, policy_version 1550 (0.0010)
[2023-09-14 13:27:33,930][61593] Updated weights for policy 1, policy_version 1590 (0.0010)
[2023-09-14 13:27:35,654][61425] Saving new best policy, reward=23.390!
[2023-09-14 13:27:38,942][61591] Updated weights for policy 0, policy_version 1560 (0.0010)
[2023-09-14 13:27:40,300][61593] Updated weights for policy 1, policy_version 1600 (0.0010)
[2023-09-14 13:27:44,164][61591] Updated weights for policy 0, policy_version 1570 (0.0011)
[2023-09-14 13:27:46,928][61593] Updated weights for policy 1, policy_version 1610 (0.0010)
[2023-09-14 13:27:49,060][61591] Updated weights for policy 0, policy_version 1580 (0.0011)
[2023-09-14 13:27:50,660][61308] Saving new best policy, reward=20.925!
[2023-09-14 13:27:53,668][61593] Updated weights for policy 1, policy_version 1620 (0.0011)
[2023-09-14 13:27:53,736][61591] Updated weights for policy 0, policy_version 1590 (0.0010)
[2023-09-14 13:27:55,654][61308] Saving new best policy, reward=21.830!
[2023-09-14 13:27:58,451][61591] Updated weights for policy 0, policy_version 1600 (0.0010)
[2023-09-14 13:28:00,259][61593] Updated weights for policy 1, policy_version 1630 (0.0010)
[2023-09-14 13:28:03,591][61591] Updated weights for policy 0, policy_version 1610 (0.0011)
[2023-09-14 13:28:05,655][61308] Saving new best policy, reward=22.398!
[2023-09-14 13:28:06,216][61593] Updated weights for policy 1, policy_version 1640 (0.0010)
[2023-09-14 13:28:08,441][61591] Updated weights for policy 0, policy_version 1620 (0.0010)
[2023-09-14 13:28:12,974][61593] Updated weights for policy 1, policy_version 1650 (0.0011)
[2023-09-14 13:28:13,180][61591] Updated weights for policy 0, policy_version 1630 (0.0010)
[2023-09-14 13:28:17,924][61591] Updated weights for policy 0, policy_version 1640 (0.0010)
[2023-09-14 13:28:19,508][61593] Updated weights for policy 1, policy_version 1660 (0.0010)
[2023-09-14 13:28:20,660][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001661_6803456.pth...
[2023-09-14 13:28:20,660][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth...
[2023-09-14 13:28:20,715][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001229_5033984.pth
[2023-09-14 13:28:20,731][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001245_5099520.pth
[2023-09-14 13:28:22,682][61591] Updated weights for policy 0, policy_version 1650 (0.0011)
[2023-09-14 13:28:25,769][61593] Updated weights for policy 1, policy_version 1670 (0.0011)
[2023-09-14 13:28:27,596][61591] Updated weights for policy 0, policy_version 1660 (0.0010)
[2023-09-14 13:28:31,874][61593] Updated weights for policy 1, policy_version 1680 (0.0010)
[2023-09-14 13:28:32,339][61591] Updated weights for policy 0, policy_version 1670 (0.0012)
[2023-09-14 13:28:37,291][61591] Updated weights for policy 0, policy_version 1680 (0.0011)
[2023-09-14 13:28:37,547][61593] Updated weights for policy 1, policy_version 1690 (0.0009)
[2023-09-14 13:28:40,659][61308] Saving new best policy, reward=22.452!
[2023-09-14 13:28:42,004][61591] Updated weights for policy 0, policy_version 1690 (0.0011)
[2023-09-14 13:28:43,833][61593] Updated weights for policy 1, policy_version 1700 (0.0010)
[2023-09-14 13:28:46,816][61591] Updated weights for policy 0, policy_version 1700 (0.0011)
[2023-09-14 13:28:49,758][61593] Updated weights for policy 1, policy_version 1710 (0.0010)
[2023-09-14 13:28:50,694][61308] Saving new best policy, reward=23.224!
[2023-09-14 13:28:52,458][61591] Updated weights for policy 0, policy_version 1710 (0.0010)
[2023-09-14 13:28:54,873][61593] Updated weights for policy 1, policy_version 1720 (0.0010)
[2023-09-14 13:28:58,198][61591] Updated weights for policy 0, policy_version 1720 (0.0010)
[2023-09-14 13:29:00,120][61593] Updated weights for policy 1, policy_version 1730 (0.0011)
[2023-09-14 13:29:03,871][61591] Updated weights for policy 0, policy_version 1730 (0.0011)
[2023-09-14 13:29:05,310][61593] Updated weights for policy 1, policy_version 1740 (0.0011)
[2023-09-14 13:29:05,654][61425] Saving new best policy, reward=23.468!
[2023-09-14 13:29:09,668][61591] Updated weights for policy 0, policy_version 1740 (0.0011)
[2023-09-14 13:29:10,659][61425] Saving new best policy, reward=23.811!
[2023-09-14 13:29:10,809][61593] Updated weights for policy 1, policy_version 1750 (0.0011)
[2023-09-14 13:29:14,365][61591] Updated weights for policy 0, policy_version 1750 (0.0010)
[2023-09-14 13:29:15,656][61425] Saving new best policy, reward=24.501!
[2023-09-14 13:29:18,219][61593] Updated weights for policy 1, policy_version 1760 (0.0011)
[2023-09-14 13:29:18,909][61591] Updated weights for policy 0, policy_version 1760 (0.0011)
[2023-09-14 13:29:20,661][61425] Saving new best policy, reward=24.513!
[2023-09-14 13:29:23,422][61591] Updated weights for policy 0, policy_version 1770 (0.0011)
[2023-09-14 13:29:25,656][61425] Saving new best policy, reward=24.523!
[2023-09-14 13:29:25,809][61593] Updated weights for policy 1, policy_version 1770 (0.0010)
[2023-09-14 13:29:27,921][61591] Updated weights for policy 0, policy_version 1780 (0.0010)
[2023-09-14 13:29:30,659][61425] Saving new best policy, reward=24.659!
[2023-09-14 13:29:32,416][61591] Updated weights for policy 0, policy_version 1790 (0.0011)
[2023-09-14 13:29:33,390][61593] Updated weights for policy 1, policy_version 1780 (0.0011)
[2023-09-14 13:29:35,714][61425] Saving new best policy, reward=24.678!
[2023-09-14 13:29:36,853][61591] Updated weights for policy 0, policy_version 1800 (0.0010)
[2023-09-14 13:29:41,176][61593] Updated weights for policy 1, policy_version 1790 (0.0010)
[2023-09-14 13:29:41,299][61591] Updated weights for policy 0, policy_version 1810 (0.0010)
[2023-09-14 13:29:45,885][61591] Updated weights for policy 0, policy_version 1820 (0.0011)
[2023-09-14 13:29:47,849][61593] Updated weights for policy 1, policy_version 1800 (0.0011)
[2023-09-14 13:29:50,658][61308] Saving new best policy, reward=23.653!
[2023-09-14 13:29:51,519][61591] Updated weights for policy 0, policy_version 1830 (0.0012)
[2023-09-14 13:29:53,178][61593] Updated weights for policy 1, policy_version 1810 (0.0011)
[2023-09-14 13:29:55,654][61308] Saving new best policy, reward=23.767!
[2023-09-14 13:29:57,794][61591] Updated weights for policy 0, policy_version 1840 (0.0010)
[2023-09-14 13:29:58,315][61593] Updated weights for policy 1, policy_version 1820 (0.0011)
[2023-09-14 13:30:00,661][61308] Saving new best policy, reward=24.121!
[2023-09-14 13:30:03,041][61593] Updated weights for policy 1, policy_version 1830 (0.0011)
[2023-09-14 13:30:04,651][61591] Updated weights for policy 0, policy_version 1850 (0.0011)
[2023-09-14 13:30:07,740][61593] Updated weights for policy 1, policy_version 1840 (0.0011)
[2023-09-14 13:30:11,683][61591] Updated weights for policy 0, policy_version 1860 (0.0010)
[2023-09-14 13:30:12,521][61593] Updated weights for policy 1, policy_version 1850 (0.0011)
[2023-09-14 13:30:17,681][61593] Updated weights for policy 1, policy_version 1860 (0.0011)
[2023-09-14 13:30:18,560][61591] Updated weights for policy 0, policy_version 1870 (0.0010)
[2023-09-14 13:30:20,659][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001873_7671808.pth...
[2023-09-14 13:30:20,711][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001436_5881856.pth
[2023-09-14 13:30:20,716][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001866_7643136.pth...
[2023-09-14 13:30:20,777][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001445_5918720.pth
[2023-09-14 13:30:22,827][61593] Updated weights for policy 1, policy_version 1870 (0.0010)
[2023-09-14 13:30:25,332][61591] Updated weights for policy 0, policy_version 1880 (0.0010)
[2023-09-14 13:30:28,034][61593] Updated weights for policy 1, policy_version 1880 (0.0011)
[2023-09-14 13:30:31,367][61591] Updated weights for policy 0, policy_version 1890 (0.0011)
[2023-09-14 13:30:33,397][61593] Updated weights for policy 1, policy_version 1890 (0.0010)
[2023-09-14 13:30:37,323][61591] Updated weights for policy 0, policy_version 1900 (0.0010)
[2023-09-14 13:30:38,663][61593] Updated weights for policy 1, policy_version 1900 (0.0011)
[2023-09-14 13:30:42,985][61591] Updated weights for policy 0, policy_version 1910 (0.0010)
[2023-09-14 13:30:43,899][61593] Updated weights for policy 1, policy_version 1910 (0.0011)
[2023-09-14 13:30:48,824][61591] Updated weights for policy 0, policy_version 1920 (0.0011)
[2023-09-14 13:30:49,075][61593] Updated weights for policy 1, policy_version 1920 (0.0011)
[2023-09-14 13:30:54,486][61591] Updated weights for policy 0, policy_version 1930 (0.0010)
[2023-09-14 13:30:54,634][61593] Updated weights for policy 1, policy_version 1930 (0.0011)
[2023-09-14 13:31:00,166][61591] Updated weights for policy 0, policy_version 1940 (0.0012)
[2023-09-14 13:31:00,496][61593] Updated weights for policy 1, policy_version 1940 (0.0012)
[2023-09-14 13:31:00,659][61425] Saving new best policy, reward=24.706!
[2023-09-14 13:31:05,593][61591] Updated weights for policy 0, policy_version 1950 (0.0011)
[2023-09-14 13:31:06,418][61593] Updated weights for policy 1, policy_version 1950 (0.0011)
[2023-09-14 13:31:11,041][61591] Updated weights for policy 0, policy_version 1960 (0.0010)
[2023-09-14 13:31:12,101][61593] Updated weights for policy 1, policy_version 1960 (0.0011)
[2023-09-14 13:31:16,577][61591] Updated weights for policy 0, policy_version 1970 (0.0010)
[2023-09-14 13:31:17,635][61593] Updated weights for policy 1, policy_version 1970 (0.0011)
[2023-09-14 13:31:22,286][61591] Updated weights for policy 0, policy_version 1980 (0.0011)
[2023-09-14 13:31:23,597][61593] Updated weights for policy 1, policy_version 1980 (0.0010)
[2023-09-14 13:31:27,990][61591] Updated weights for policy 0, policy_version 1990 (0.0011)
[2023-09-14 13:31:29,115][61593] Updated weights for policy 1, policy_version 1990 (0.0010)
[2023-09-14 13:31:33,345][61591] Updated weights for policy 0, policy_version 2000 (0.0011)
[2023-09-14 13:31:35,124][61593] Updated weights for policy 1, policy_version 2000 (0.0010)
[2023-09-14 13:31:38,441][61591] Updated weights for policy 0, policy_version 2010 (0.0012)
[2023-09-14 13:31:40,949][61593] Updated weights for policy 1, policy_version 2010 (0.0010)
[2023-09-14 13:31:43,469][61591] Updated weights for policy 0, policy_version 2020 (0.0010)
[2023-09-14 13:31:46,266][61593] Updated weights for policy 1, policy_version 2020 (0.0010)
[2023-09-14 13:31:49,165][61591] Updated weights for policy 0, policy_version 2030 (0.0011)
[2023-09-14 13:31:51,400][61593] Updated weights for policy 1, policy_version 2030 (0.0011)
[2023-09-14 13:31:55,052][61591] Updated weights for policy 0, policy_version 2040 (0.0011)
[2023-09-14 13:31:56,559][61593] Updated weights for policy 1, policy_version 2040 (0.0010)
[2023-09-14 13:32:01,056][61591] Updated weights for policy 0, policy_version 2050 (0.0010)
[2023-09-14 13:32:01,740][61593] Updated weights for policy 1, policy_version 2050 (0.0011)
[2023-09-14 13:32:06,858][61593] Updated weights for policy 1, policy_version 2060 (0.0012)
[2023-09-14 13:32:07,742][61591] Updated weights for policy 0, policy_version 2060 (0.0011)
[2023-09-14 13:32:10,662][61425] Saving new best policy, reward=24.867!
[2023-09-14 13:32:11,925][61593] Updated weights for policy 1, policy_version 2070 (0.0011)
[2023-09-14 13:32:14,008][61591] Updated weights for policy 0, policy_version 2070 (0.0010)
[2023-09-14 13:32:15,654][61425] Saving new best policy, reward=25.495!
[2023-09-14 13:32:17,050][61593] Updated weights for policy 1, policy_version 2080 (0.0010)
[2023-09-14 13:32:20,518][61591] Updated weights for policy 0, policy_version 2080 (0.0011)
[2023-09-14 13:32:20,658][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000002087_8548352.pth...
[2023-09-14 13:32:20,660][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000002080_8519680.pth...
[2023-09-14 13:32:20,712][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001661_6803456.pth
[2023-09-14 13:32:20,717][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth
[2023-09-14 13:32:20,719][61425] Saving new best policy, reward=25.847!
[2023-09-14 13:32:21,929][61593] Updated weights for policy 1, policy_version 2090 (0.0010)
[2023-09-14 13:32:26,618][61593] Updated weights for policy 1, policy_version 2100 (0.0010)
[2023-09-14 13:32:27,371][61591] Updated weights for policy 0, policy_version 2090 (0.0011)
[2023-09-14 13:32:31,374][61593] Updated weights for policy 1, policy_version 2110 (0.0011)
[2023-09-14 13:32:34,358][61591] Updated weights for policy 0, policy_version 2100 (0.0010)
[2023-09-14 13:32:36,104][61593] Updated weights for policy 1, policy_version 2120 (0.0010)
[2023-09-14 13:32:40,731][61593] Updated weights for policy 1, policy_version 2130 (0.0010)
[2023-09-14 13:32:41,605][61591] Updated weights for policy 0, policy_version 2110 (0.0013)
[2023-09-14 13:32:45,312][61593] Updated weights for policy 1, policy_version 2140 (0.0011)
[2023-09-14 13:32:48,580][61591] Updated weights for policy 0, policy_version 2120 (0.0010)
[2023-09-14 13:32:50,092][61593] Updated weights for policy 1, policy_version 2150 (0.0010)
[2023-09-14 13:32:55,074][61593] Updated weights for policy 1, policy_version 2160 (0.0011)
[2023-09-14 13:32:55,142][61591] Updated weights for policy 0, policy_version 2130 (0.0011)
[2023-09-14 13:32:55,654][61308] Saving new best policy, reward=24.602!
[2023-09-14 13:32:59,836][61593] Updated weights for policy 1, policy_version 2170 (0.0011)
[2023-09-14 13:33:00,659][61308] Saving new best policy, reward=24.900!
[2023-09-14 13:33:02,170][61591] Updated weights for policy 0, policy_version 2140 (0.0011)
[2023-09-14 13:33:04,847][61593] Updated weights for policy 1, policy_version 2180 (0.0011)
[2023-09-14 13:33:08,560][61591] Updated weights for policy 0, policy_version 2150 (0.0010)
[2023-09-14 13:33:09,877][61593] Updated weights for policy 1, policy_version 2190 (0.0011)
[2023-09-14 13:33:10,658][61308] Saving new best policy, reward=24.950!
[2023-09-14 13:33:14,498][61591] Updated weights for policy 0, policy_version 2160 (0.0011)
[2023-09-14 13:33:15,421][61593] Updated weights for policy 1, policy_version 2200 (0.0011)
[2023-09-14 13:33:15,654][61308] Saving new best policy, reward=25.172!
[2023-09-14 13:33:19,968][61591] Updated weights for policy 0, policy_version 2170 (0.0011)
[2023-09-14 13:33:20,657][61308] Saving new best policy, reward=25.605!
[2023-09-14 13:33:22,039][61593] Updated weights for policy 1, policy_version 2210 (0.0011)
[2023-09-14 13:33:24,923][61591] Updated weights for policy 0, policy_version 2180 (0.0011)
[2023-09-14 13:33:28,829][61593] Updated weights for policy 1, policy_version 2220 (0.0011)
[2023-09-14 13:33:29,949][61591] Updated weights for policy 0, policy_version 2190 (0.0010)
[2023-09-14 13:33:34,618][61591] Updated weights for policy 0, policy_version 2200 (0.0011)
[2023-09-14 13:33:35,979][61593] Updated weights for policy 1, policy_version 2230 (0.0011)
[2023-09-14 13:33:39,501][61591] Updated weights for policy 0, policy_version 2210 (0.0011)
[2023-09-14 13:33:40,658][61425] Saving new best policy, reward=25.879!
[2023-09-14 13:33:43,338][61593] Updated weights for policy 1, policy_version 2240 (0.0010)
[2023-09-14 13:33:44,435][61591] Updated weights for policy 0, policy_version 2220 (0.0011)
[2023-09-14 13:33:45,653][61425] Saving new best policy, reward=26.442!
[2023-09-14 13:33:49,516][61593] Updated weights for policy 1, policy_version 2250 (0.0011)
[2023-09-14 13:33:49,930][61591] Updated weights for policy 0, policy_version 2230 (0.0011)
[2023-09-14 13:33:50,662][61425] Saving new best policy, reward=26.541!
[2023-09-14 13:33:54,586][61593] Updated weights for policy 1, policy_version 2260 (0.0010)
[2023-09-14 13:33:56,605][61591] Updated weights for policy 0, policy_version 2240 (0.0011)
[2023-09-14 13:33:59,687][61593] Updated weights for policy 1, policy_version 2270 (0.0011)
[2023-09-14 13:34:03,210][61591] Updated weights for policy 0, policy_version 2250 (0.0012)
[2023-09-14 13:34:04,841][61593] Updated weights for policy 1, policy_version 2280 (0.0011)
[2023-09-14 13:34:09,429][61591] Updated weights for policy 0, policy_version 2260 (0.0010)
[2023-09-14 13:34:09,944][61593] Updated weights for policy 1, policy_version 2290 (0.0010)
[2023-09-14 13:34:14,711][61593] Updated weights for policy 1, policy_version 2300 (0.0011)
[2023-09-14 13:34:16,257][61591] Updated weights for policy 0, policy_version 2270 (0.0011)
[2023-09-14 13:34:19,513][61593] Updated weights for policy 1, policy_version 2310 (0.0011)
[2023-09-14 13:34:20,659][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000002312_9469952.pth...
[2023-09-14 13:34:20,660][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000002276_9322496.pth...
[2023-09-14 13:34:20,716][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000001866_7643136.pth
[2023-09-14 13:34:20,722][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000001873_7671808.pth
[2023-09-14 13:34:23,316][61591] Updated weights for policy 0, policy_version 2280 (0.0011)
[2023-09-14 13:34:24,673][61593] Updated weights for policy 1, policy_version 2320 (0.0011)
[2023-09-14 13:34:29,291][61593] Updated weights for policy 1, policy_version 2330 (0.0010)
[2023-09-14 13:34:30,245][61591] Updated weights for policy 0, policy_version 2290 (0.0010)
[2023-09-14 13:34:35,082][61593] Updated weights for policy 1, policy_version 2340 (0.0010)
[2023-09-14 13:34:35,877][61591] Updated weights for policy 0, policy_version 2300 (0.0011)
[2023-09-14 13:34:40,677][61593] Updated weights for policy 1, policy_version 2350 (0.0011)
[2023-09-14 13:34:41,639][61591] Updated weights for policy 0, policy_version 2310 (0.0011)
[2023-09-14 13:34:45,813][61593] Updated weights for policy 1, policy_version 2360 (0.0010)
[2023-09-14 13:34:47,983][61591] Updated weights for policy 0, policy_version 2320 (0.0010)
[2023-09-14 13:34:50,639][61593] Updated weights for policy 1, policy_version 2370 (0.0010)
[2023-09-14 13:34:54,990][61591] Updated weights for policy 0, policy_version 2330 (0.0010)
[2023-09-14 13:34:55,158][61593] Updated weights for policy 1, policy_version 2380 (0.0011)
[2023-09-14 13:34:59,736][61593] Updated weights for policy 1, policy_version 2390 (0.0011)
[2023-09-14 13:35:02,226][61591] Updated weights for policy 0, policy_version 2340 (0.0011)
[2023-09-14 13:35:04,382][61593] Updated weights for policy 1, policy_version 2400 (0.0011)
[2023-09-14 13:35:09,063][61591] Updated weights for policy 0, policy_version 2350 (0.0011)
[2023-09-14 13:35:09,213][61593] Updated weights for policy 1, policy_version 2410 (0.0011)
[2023-09-14 13:35:13,386][61593] Updated weights for policy 1, policy_version 2420 (0.0011)
[2023-09-14 13:35:17,315][61591] Updated weights for policy 0, policy_version 2360 (0.0012)
[2023-09-14 13:35:17,399][61593] Updated weights for policy 1, policy_version 2430 (0.0011)
[2023-09-14 13:35:21,657][61593] Updated weights for policy 1, policy_version 2440 (0.0010)
[2023-09-14 13:35:24,791][61591] Updated weights for policy 0, policy_version 2370 (0.0011)
[2023-09-14 13:35:26,642][61593] Updated weights for policy 1, policy_version 2450 (0.0011)
[2023-09-14 13:35:31,249][61591] Updated weights for policy 0, policy_version 2380 (0.0010)
[2023-09-14 13:35:31,832][61593] Updated weights for policy 1, policy_version 2460 (0.0011)
[2023-09-14 13:35:36,806][61593] Updated weights for policy 1, policy_version 2470 (0.0010)
[2023-09-14 13:35:37,716][61591] Updated weights for policy 0, policy_version 2390 (0.0011)
[2023-09-14 13:35:42,681][61593] Updated weights for policy 1, policy_version 2480 (0.0011)
[2023-09-14 13:35:43,100][61591] Updated weights for policy 0, policy_version 2400 (0.0011)
[2023-09-14 13:35:45,654][61308] Saving new best policy, reward=25.641!
[2023-09-14 13:35:47,985][61591] Updated weights for policy 0, policy_version 2410 (0.0010)
[2023-09-14 13:35:49,900][61593] Updated weights for policy 1, policy_version 2490 (0.0011)
[2023-09-14 13:35:52,960][61591] Updated weights for policy 0, policy_version 2420 (0.0009)
[2023-09-14 13:35:56,522][61593] Updated weights for policy 1, policy_version 2500 (0.0010)
[2023-09-14 13:35:57,909][61591] Updated weights for policy 0, policy_version 2430 (0.0010)
[2023-09-14 13:36:02,364][61591] Updated weights for policy 0, policy_version 2440 (0.0011)
[2023-09-14 13:36:03,299][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000002508_10272768.pth...
[2023-09-14 13:36:03,300][61308] Stopping Batcher_0...
[2023-09-14 13:36:03,300][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000002442_10002432.pth...
[2023-09-14 13:36:03,301][61308] Loop batcher_evt_loop terminating...
[2023-09-14 13:36:03,305][61425] Stopping Batcher_1...
[2023-09-14 13:36:03,316][61633] Stopping RolloutWorker_w4...
[2023-09-14 13:36:03,316][61633] Loop rollout_proc4_evt_loop terminating...
[2023-09-14 13:36:03,317][61690] Stopping RolloutWorker_w5...
[2023-09-14 13:36:03,318][61690] Loop rollout_proc5_evt_loop terminating...
[2023-09-14 13:36:03,317][61592] Stopping RolloutWorker_w0...
[2023-09-14 13:36:03,318][61592] Loop rollout_proc0_evt_loop terminating...
[2023-09-14 13:36:03,318][61595] Stopping RolloutWorker_w2...
[2023-09-14 13:36:03,319][61595] Loop rollout_proc2_evt_loop terminating...
[2023-09-14 13:36:03,319][61631] Stopping RolloutWorker_w3...
[2023-09-14 13:36:03,319][61631] Loop rollout_proc3_evt_loop terminating...
[2023-09-14 13:36:03,319][61692] Stopping RolloutWorker_w7...
[2023-09-14 13:36:03,320][61594] Stopping RolloutWorker_w1...
[2023-09-14 13:36:03,320][61425] Loop batcher_evt_loop terminating...
[2023-09-14 13:36:03,320][61692] Loop rollout_proc7_evt_loop terminating...
[2023-09-14 13:36:03,320][61593] Weights refcount: 2 0
[2023-09-14 13:36:03,320][61594] Loop rollout_proc1_evt_loop terminating...
[2023-09-14 13:36:03,321][61593] Stopping InferenceWorker_p1-w0...
[2023-09-14 13:36:03,321][61593] Loop inference_proc1-0_evt_loop terminating...
[2023-09-14 13:36:03,322][61591] Weights refcount: 2 0
[2023-09-14 13:36:03,323][61591] Stopping InferenceWorker_p0-w0...
[2023-09-14 13:36:03,324][61591] Loop inference_proc0-0_evt_loop terminating...
[2023-09-14 13:36:03,329][61691] Stopping RolloutWorker_w6...
[2023-09-14 13:36:03,329][61691] Loop rollout_proc6_evt_loop terminating...
[2023-09-14 13:36:03,358][61308] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000002080_8519680.pth
[2023-09-14 13:36:03,365][61308] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p0/checkpoint_000002442_10002432.pth...
[2023-09-14 13:36:03,365][61425] Removing /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000002087_8548352.pth
[2023-09-14 13:36:03,375][61425] Saving /home/cogstack/Documents/optuna/environments/sample_factory/train_dir/default_experiment/checkpoint_p1/checkpoint_000002508_10272768.pth...
[2023-09-14 13:36:03,427][61308] Stopping LearnerWorker_p0...
[2023-09-14 13:36:03,428][61308] Loop learner_proc0_evt_loop terminating...
[2023-09-14 13:36:03,471][61425] Stopping LearnerWorker_p1...
[2023-09-14 13:36:03,472][61425] Loop learner_proc1_evt_loop terminating...