rl_course_vizdoom / sf_log.txt
drap's picture
Upload folder using huggingface_hub
1650aa5 verified
[2024-11-07 17:51:58,286][00281] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-11-07 17:51:58,289][00281] Rollout worker 0 uses device cpu
[2024-11-07 17:51:58,290][00281] Rollout worker 1 uses device cpu
[2024-11-07 17:51:58,292][00281] Rollout worker 2 uses device cpu
[2024-11-07 17:51:58,293][00281] Rollout worker 3 uses device cpu
[2024-11-07 17:51:58,294][00281] Rollout worker 4 uses device cpu
[2024-11-07 17:51:58,295][00281] Rollout worker 5 uses device cpu
[2024-11-07 17:51:58,296][00281] Rollout worker 6 uses device cpu
[2024-11-07 17:51:58,297][00281] Rollout worker 7 uses device cpu
[2024-11-07 17:51:58,456][00281] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-11-07 17:51:58,458][00281] InferenceWorker_p0-w0: min num requests: 2
[2024-11-07 17:51:58,492][00281] Starting all processes...
[2024-11-07 17:51:58,495][00281] Starting process learner_proc0
[2024-11-07 17:51:58,540][00281] Starting all processes...
[2024-11-07 17:51:58,550][00281] Starting process inference_proc0-0
[2024-11-07 17:51:58,550][00281] Starting process rollout_proc0
[2024-11-07 17:51:58,552][00281] Starting process rollout_proc1
[2024-11-07 17:51:58,553][00281] Starting process rollout_proc2
[2024-11-07 17:51:58,553][00281] Starting process rollout_proc3
[2024-11-07 17:51:58,553][00281] Starting process rollout_proc4
[2024-11-07 17:51:58,553][00281] Starting process rollout_proc5
[2024-11-07 17:51:58,553][00281] Starting process rollout_proc6
[2024-11-07 17:51:58,553][00281] Starting process rollout_proc7
[2024-11-07 17:52:06,192][00281] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 281], exiting...
[2024-11-07 17:52:06,198][00281] Runner profile tree view:
main_loop: 7.7048
[2024-11-07 17:52:06,202][00281] Collected {}, FPS: 0.0
[2024-11-07 17:52:19,998][00281] Environment doom_basic already registered, overwriting...
[2024-11-07 17:52:20,000][00281] Environment doom_two_colors_easy already registered, overwriting...
[2024-11-07 17:52:20,002][00281] Environment doom_two_colors_hard already registered, overwriting...
[2024-11-07 17:52:20,003][00281] Environment doom_dm already registered, overwriting...
[2024-11-07 17:52:20,005][00281] Environment doom_dwango5 already registered, overwriting...
[2024-11-07 17:52:20,006][00281] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2024-11-07 17:52:20,008][00281] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2024-11-07 17:52:20,009][00281] Environment doom_my_way_home already registered, overwriting...
[2024-11-07 17:52:20,010][00281] Environment doom_deadly_corridor already registered, overwriting...
[2024-11-07 17:52:20,012][00281] Environment doom_defend_the_center already registered, overwriting...
[2024-11-07 17:52:20,013][00281] Environment doom_defend_the_line already registered, overwriting...
[2024-11-07 17:52:20,014][00281] Environment doom_health_gathering already registered, overwriting...
[2024-11-07 17:52:20,015][00281] Environment doom_health_gathering_supreme already registered, overwriting...
[2024-11-07 17:52:20,017][00281] Environment doom_battle already registered, overwriting...
[2024-11-07 17:52:20,018][00281] Environment doom_battle2 already registered, overwriting...
[2024-11-07 17:52:20,019][00281] Environment doom_duel_bots already registered, overwriting...
[2024-11-07 17:52:20,021][00281] Environment doom_deathmatch_bots already registered, overwriting...
[2024-11-07 17:52:20,022][00281] Environment doom_duel already registered, overwriting...
[2024-11-07 17:52:20,023][00281] Environment doom_deathmatch_full already registered, overwriting...
[2024-11-07 17:52:20,024][00281] Environment doom_benchmark already registered, overwriting...
[2024-11-07 17:52:20,026][00281] register_encoder_factory: <function make_vizdoom_encoder at 0x7cccac3fe290>
[2024-11-07 17:52:20,052][00281] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-11-07 17:52:20,053][00281] Overriding arg 'env' with value 'doom_deadly_corridor' passed from command line
[2024-11-07 17:52:20,059][00281] Experiment dir /content/train_dir/default_experiment already exists!
[2024-11-07 17:52:20,060][00281] Resuming existing experiment from /content/train_dir/default_experiment...
[2024-11-07 17:52:20,062][00281] Weights and Biases integration disabled
[2024-11-07 17:52:20,066][00281] Environment var CUDA_VISIBLE_DEVICES is 0
[2024-11-07 17:52:22,251][00281] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_deadly_corridor
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=4000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2024-11-07 17:52:22,253][00281] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-11-07 17:52:22,259][00281] Rollout worker 0 uses device cpu
[2024-11-07 17:52:22,260][00281] Rollout worker 1 uses device cpu
[2024-11-07 17:52:22,262][00281] Rollout worker 2 uses device cpu
[2024-11-07 17:52:22,263][00281] Rollout worker 3 uses device cpu
[2024-11-07 17:52:22,265][00281] Rollout worker 4 uses device cpu
[2024-11-07 17:52:22,266][00281] Rollout worker 5 uses device cpu
[2024-11-07 17:52:22,267][00281] Rollout worker 6 uses device cpu
[2024-11-07 17:52:22,268][00281] Rollout worker 7 uses device cpu
[2024-11-07 17:52:22,365][00281] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-11-07 17:52:22,367][00281] InferenceWorker_p0-w0: min num requests: 2
[2024-11-07 17:52:22,398][00281] Starting all processes...
[2024-11-07 17:52:22,400][00281] Starting process learner_proc0
[2024-11-07 17:52:22,448][00281] Starting all processes...
[2024-11-07 17:52:22,456][00281] Starting process inference_proc0-0
[2024-11-07 17:52:22,457][00281] Starting process rollout_proc0
[2024-11-07 17:52:22,457][00281] Starting process rollout_proc1
[2024-11-07 17:52:22,457][00281] Starting process rollout_proc2
[2024-11-07 17:52:22,458][00281] Starting process rollout_proc3
[2024-11-07 17:52:22,458][00281] Starting process rollout_proc4
[2024-11-07 17:52:22,458][00281] Starting process rollout_proc5
[2024-11-07 17:52:22,458][00281] Starting process rollout_proc6
[2024-11-07 17:52:22,458][00281] Starting process rollout_proc7
[2024-11-07 17:52:40,694][03002] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-11-07 17:52:40,704][03002] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-11-07 17:52:40,832][03002] Num visible devices: 1
[2024-11-07 17:52:40,888][03002] Starting seed is not provided
[2024-11-07 17:52:40,889][03002] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-11-07 17:52:40,889][03002] Initializing actor-critic model on device cuda:0
[2024-11-07 17:52:40,890][03002] RunningMeanStd input shape: (3, 72, 128)
[2024-11-07 17:52:40,893][03002] RunningMeanStd input shape: (1,)
[2024-11-07 17:52:41,089][03002] ConvEncoder: input_channels=3
[2024-11-07 17:52:41,213][03015] Worker 0 uses CPU cores [0]
[2024-11-07 17:52:41,274][03018] Worker 2 uses CPU cores [0]
[2024-11-07 17:52:41,656][03019] Worker 3 uses CPU cores [1]
[2024-11-07 17:52:41,677][03017] Worker 1 uses CPU cores [1]
[2024-11-07 17:52:41,760][03016] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-11-07 17:52:41,762][03016] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-11-07 17:52:41,773][03020] Worker 4 uses CPU cores [0]
[2024-11-07 17:52:41,775][03023] Worker 6 uses CPU cores [0]
[2024-11-07 17:52:41,814][03016] Num visible devices: 1
[2024-11-07 17:52:41,860][03021] Worker 5 uses CPU cores [1]
[2024-11-07 17:52:42,007][03002] Conv encoder output size: 512
[2024-11-07 17:52:42,008][03002] Policy head output size: 512
[2024-11-07 17:52:42,035][03022] Worker 7 uses CPU cores [1]
[2024-11-07 17:52:42,085][03002] Created Actor Critic model with architecture:
[2024-11-07 17:52:42,085][03002] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=11, bias=True)
)
)
[2024-11-07 17:52:42,357][00281] Heartbeat connected on Batcher_0
[2024-11-07 17:52:42,366][00281] Heartbeat connected on InferenceWorker_p0-w0
[2024-11-07 17:52:42,374][00281] Heartbeat connected on RolloutWorker_w0
[2024-11-07 17:52:42,378][00281] Heartbeat connected on RolloutWorker_w1
[2024-11-07 17:52:42,381][00281] Heartbeat connected on RolloutWorker_w2
[2024-11-07 17:52:42,386][00281] Heartbeat connected on RolloutWorker_w3
[2024-11-07 17:52:42,388][00281] Heartbeat connected on RolloutWorker_w4
[2024-11-07 17:52:42,392][00281] Heartbeat connected on RolloutWorker_w5
[2024-11-07 17:52:42,396][00281] Heartbeat connected on RolloutWorker_w6
[2024-11-07 17:52:42,398][00281] Heartbeat connected on RolloutWorker_w7
[2024-11-07 17:52:42,436][03002] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-11-07 17:52:46,252][03002] No checkpoints found
[2024-11-07 17:52:46,252][03002] Did not load from checkpoint, starting from scratch!
[2024-11-07 17:52:46,253][03002] Initialized policy 0 weights for model version 0
[2024-11-07 17:52:46,257][03002] LearnerWorker_p0 finished initialization!
[2024-11-07 17:52:46,258][00281] Heartbeat connected on LearnerWorker_p0
[2024-11-07 17:52:46,260][03002] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-11-07 17:52:46,357][03016] RunningMeanStd input shape: (3, 72, 128)
[2024-11-07 17:52:46,358][03016] RunningMeanStd input shape: (1,)
[2024-11-07 17:52:46,370][03016] ConvEncoder: input_channels=3
[2024-11-07 17:52:46,474][03016] Conv encoder output size: 512
[2024-11-07 17:52:46,474][03016] Policy head output size: 512
[2024-11-07 17:52:46,527][00281] Inference worker 0-0 is ready!
[2024-11-07 17:52:46,529][00281] All inference workers are ready! Signal rollout workers to start!
[2024-11-07 17:52:46,737][03021] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:46,738][03019] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:46,741][03022] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:46,735][03017] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:46,756][03020] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:46,757][03018] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:46,758][03023] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:46,759][03015] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 17:52:47,739][03023] Decorrelating experience for 0 frames...
[2024-11-07 17:52:47,740][03018] Decorrelating experience for 0 frames...
[2024-11-07 17:52:48,114][03018] Decorrelating experience for 32 frames...
[2024-11-07 17:52:48,415][03017] Decorrelating experience for 0 frames...
[2024-11-07 17:52:48,419][03021] Decorrelating experience for 0 frames...
[2024-11-07 17:52:48,421][03019] Decorrelating experience for 0 frames...
[2024-11-07 17:52:48,425][03022] Decorrelating experience for 0 frames...
[2024-11-07 17:52:49,206][03015] Decorrelating experience for 0 frames...
[2024-11-07 17:52:49,222][03023] Decorrelating experience for 32 frames...
[2024-11-07 17:52:49,727][03023] Decorrelating experience for 64 frames...
[2024-11-07 17:52:49,853][03017] Decorrelating experience for 32 frames...
[2024-11-07 17:52:49,855][03022] Decorrelating experience for 32 frames...
[2024-11-07 17:52:49,857][03021] Decorrelating experience for 32 frames...
[2024-11-07 17:52:49,868][03019] Decorrelating experience for 32 frames...
[2024-11-07 17:52:50,067][00281] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-11-07 17:52:51,078][03022] Decorrelating experience for 64 frames...
[2024-11-07 17:52:51,080][03017] Decorrelating experience for 64 frames...
[2024-11-07 17:52:51,336][03015] Decorrelating experience for 32 frames...
[2024-11-07 17:52:51,441][03018] Decorrelating experience for 64 frames...
[2024-11-07 17:52:51,454][03020] Decorrelating experience for 0 frames...
[2024-11-07 17:52:51,479][03023] Decorrelating experience for 96 frames...
[2024-11-07 17:52:52,290][03020] Decorrelating experience for 32 frames...
[2024-11-07 17:52:52,638][03021] Decorrelating experience for 64 frames...
[2024-11-07 17:52:52,785][03017] Decorrelating experience for 96 frames...
[2024-11-07 17:52:52,815][03022] Decorrelating experience for 96 frames...
[2024-11-07 17:52:53,501][03019] Decorrelating experience for 64 frames...
[2024-11-07 17:52:54,750][03015] Decorrelating experience for 64 frames...
[2024-11-07 17:52:54,902][03020] Decorrelating experience for 64 frames...
[2024-11-07 17:52:55,067][00281] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.2. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-11-07 17:52:55,072][00281] Avg episode reward: [(0, '-1.014')]
[2024-11-07 17:52:56,068][03018] Decorrelating experience for 96 frames...
[2024-11-07 17:52:56,553][03019] Decorrelating experience for 96 frames...
[2024-11-07 17:52:57,774][03015] Decorrelating experience for 96 frames...
[2024-11-07 17:52:57,948][03020] Decorrelating experience for 96 frames...
[2024-11-07 17:52:59,777][03021] Decorrelating experience for 96 frames...
[2024-11-07 17:53:00,067][00281] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 127.2. Samples: 1272. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-11-07 17:53:00,069][00281] Avg episode reward: [(0, '-0.723')]
[2024-11-07 17:53:02,021][03002] Signal inference workers to stop experience collection...
[2024-11-07 17:53:02,059][03016] InferenceWorker_p0-w0: stopping experience collection
[2024-11-07 17:53:03,980][03002] Signal inference workers to resume experience collection...
[2024-11-07 17:53:03,981][03016] InferenceWorker_p0-w0: resuming experience collection
[2024-11-07 17:53:05,067][00281] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 218.1. Samples: 3272. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-11-07 17:53:05,073][00281] Avg episode reward: [(0, '-0.670')]
[2024-11-07 17:53:10,067][00281] Fps is (10 sec: 3276.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 314.1. Samples: 6282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-11-07 17:53:10,069][00281] Avg episode reward: [(0, '-0.334')]
[2024-11-07 17:53:13,719][03016] Updated weights for policy 0, policy_version 10 (0.0148)
[2024-11-07 17:53:15,067][00281] Fps is (10 sec: 2867.3, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 417.4. Samples: 10436. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-11-07 17:53:15,071][00281] Avg episode reward: [(0, '-0.157')]
[2024-11-07 17:53:20,067][00281] Fps is (10 sec: 2457.6, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 501.1. Samples: 15034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:53:20,075][00281] Avg episode reward: [(0, '0.110')]
[2024-11-07 17:53:25,067][00281] Fps is (10 sec: 3686.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 520.5. Samples: 18216. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-11-07 17:53:25,075][00281] Avg episode reward: [(0, '0.458')]
[2024-11-07 17:53:25,109][03016] Updated weights for policy 0, policy_version 20 (0.0021)
[2024-11-07 17:53:30,068][00281] Fps is (10 sec: 3685.8, 60 sec: 2355.1, 300 sec: 2355.1). Total num frames: 94208. Throughput: 0: 597.5. Samples: 23900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:53:30,076][00281] Avg episode reward: [(0, '0.695')]
[2024-11-07 17:53:35,067][00281] Fps is (10 sec: 2867.1, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 610.9. Samples: 27490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:53:35,069][00281] Avg episode reward: [(0, '0.869')]
[2024-11-07 17:53:35,075][03002] Saving new best policy, reward=0.869!
[2024-11-07 17:53:38,873][03016] Updated weights for policy 0, policy_version 30 (0.0035)
[2024-11-07 17:53:40,067][00281] Fps is (10 sec: 3277.4, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 126976. Throughput: 0: 672.6. Samples: 30284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:53:40,071][00281] Avg episode reward: [(0, '1.651')]
[2024-11-07 17:53:40,087][03002] Saving new best policy, reward=1.651!
[2024-11-07 17:53:45,067][00281] Fps is (10 sec: 3686.5, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 768.6. Samples: 35858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:53:45,072][00281] Avg episode reward: [(0, '2.141')]
[2024-11-07 17:53:45,079][03002] Saving new best policy, reward=2.141!
[2024-11-07 17:53:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 803.2. Samples: 39416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:53:50,071][00281] Avg episode reward: [(0, '2.351')]
[2024-11-07 17:53:50,087][03002] Saving new best policy, reward=2.351!
[2024-11-07 17:53:52,419][03016] Updated weights for policy 0, policy_version 40 (0.0021)
[2024-11-07 17:53:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2646.7). Total num frames: 172032. Throughput: 0: 786.4. Samples: 41672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:53:55,074][00281] Avg episode reward: [(0, '2.798')]
[2024-11-07 17:53:55,077][03002] Saving new best policy, reward=2.798!
[2024-11-07 17:54:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2750.2). Total num frames: 192512. Throughput: 0: 824.1. Samples: 47520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:54:00,068][00281] Avg episode reward: [(0, '2.929')]
[2024-11-07 17:54:00,076][03002] Saving new best policy, reward=2.929!
[2024-11-07 17:54:04,288][03016] Updated weights for policy 0, policy_version 50 (0.0014)
[2024-11-07 17:54:05,069][00281] Fps is (10 sec: 3276.0, 60 sec: 3208.4, 300 sec: 2730.6). Total num frames: 204800. Throughput: 0: 818.7. Samples: 51878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:54:05,073][00281] Avg episode reward: [(0, '2.944')]
[2024-11-07 17:54:05,077][03002] Saving new best policy, reward=2.944!
[2024-11-07 17:54:10,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2713.6). Total num frames: 217088. Throughput: 0: 785.7. Samples: 53574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:54:10,069][00281] Avg episode reward: [(0, '3.147')]
[2024-11-07 17:54:10,108][03002] Saving new best policy, reward=3.147!
[2024-11-07 17:54:15,067][00281] Fps is (10 sec: 3277.5, 60 sec: 3276.8, 300 sec: 2794.9). Total num frames: 237568. Throughput: 0: 779.7. Samples: 58984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:54:15,072][00281] Avg episode reward: [(0, '3.356')]
[2024-11-07 17:54:15,075][03002] Saving new best policy, reward=3.356!
[2024-11-07 17:54:16,758][03016] Updated weights for policy 0, policy_version 60 (0.0028)
[2024-11-07 17:54:20,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2821.7). Total num frames: 253952. Throughput: 0: 812.8. Samples: 64068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:54:20,073][00281] Avg episode reward: [(0, '3.423')]
[2024-11-07 17:54:20,089][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000062_253952.pth...
[2024-11-07 17:54:20,294][03002] Saving new best policy, reward=3.423!
[2024-11-07 17:54:25,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2759.4). Total num frames: 262144. Throughput: 0: 783.1. Samples: 65522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 17:54:25,072][00281] Avg episode reward: [(0, '3.323')]
[2024-11-07 17:54:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.4, 300 sec: 2826.2). Total num frames: 282624. Throughput: 0: 767.0. Samples: 70374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 17:54:30,070][00281] Avg episode reward: [(0, '3.422')]
[2024-11-07 17:54:30,613][03016] Updated weights for policy 0, policy_version 70 (0.0037)
[2024-11-07 17:54:35,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3276.8, 300 sec: 2886.7). Total num frames: 303104. Throughput: 0: 814.8. Samples: 76084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 17:54:35,070][00281] Avg episode reward: [(0, '3.434')]
[2024-11-07 17:54:35,074][03002] Saving new best policy, reward=3.434!
[2024-11-07 17:54:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2830.0). Total num frames: 311296. Throughput: 0: 803.2. Samples: 77816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 17:54:40,073][00281] Avg episode reward: [(0, '3.544')]
[2024-11-07 17:54:40,085][03002] Saving new best policy, reward=3.544!
[2024-11-07 17:54:44,257][03016] Updated weights for policy 0, policy_version 80 (0.0040)
[2024-11-07 17:54:45,067][00281] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 2849.4). Total num frames: 327680. Throughput: 0: 764.0. Samples: 81902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-11-07 17:54:45,071][00281] Avg episode reward: [(0, '3.574')]
[2024-11-07 17:54:45,075][03002] Saving new best policy, reward=3.574!
[2024-11-07 17:54:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2901.3). Total num frames: 348160. Throughput: 0: 800.2. Samples: 87886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:54:50,071][00281] Avg episode reward: [(0, '3.639')]
[2024-11-07 17:54:50,081][03002] Saving new best policy, reward=3.639!
[2024-11-07 17:54:55,072][00281] Fps is (10 sec: 3684.5, 60 sec: 3208.3, 300 sec: 2916.2). Total num frames: 364544. Throughput: 0: 819.2. Samples: 90444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 17:54:55,079][00281] Avg episode reward: [(0, '3.667')]
[2024-11-07 17:54:55,084][03002] Saving new best policy, reward=3.667!
[2024-11-07 17:54:56,637][03016] Updated weights for policy 0, policy_version 90 (0.0024)
[2024-11-07 17:55:00,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2867.2). Total num frames: 372736. Throughput: 0: 775.5. Samples: 93882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 17:55:00,072][00281] Avg episode reward: [(0, '3.784')]
[2024-11-07 17:55:00,158][03002] Saving new best policy, reward=3.784!
[2024-11-07 17:55:05,067][00281] Fps is (10 sec: 2049.1, 60 sec: 3003.8, 300 sec: 2852.0). Total num frames: 385024. Throughput: 0: 734.7. Samples: 97128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:55:05,069][00281] Avg episode reward: [(0, '3.686')]
[2024-11-07 17:55:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2867.2). Total num frames: 401408. Throughput: 0: 759.6. Samples: 99704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:10,070][00281] Avg episode reward: [(0, '3.966')]
[2024-11-07 17:55:10,082][03002] Saving new best policy, reward=3.966!
[2024-11-07 17:55:12,587][03016] Updated weights for policy 0, policy_version 100 (0.0041)
[2024-11-07 17:55:15,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2853.1). Total num frames: 413696. Throughput: 0: 734.1. Samples: 103410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 17:55:15,069][00281] Avg episode reward: [(0, '3.618')]
[2024-11-07 17:55:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2867.2). Total num frames: 430080. Throughput: 0: 719.0. Samples: 108440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:20,073][00281] Avg episode reward: [(0, '4.124')]
[2024-11-07 17:55:20,089][03002] Saving new best policy, reward=4.124!
[2024-11-07 17:55:24,449][03016] Updated weights for policy 0, policy_version 110 (0.0020)
[2024-11-07 17:55:25,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2906.8). Total num frames: 450560. Throughput: 0: 742.1. Samples: 111210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:25,071][00281] Avg episode reward: [(0, '4.115')]
[2024-11-07 17:55:30,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3003.7, 300 sec: 2892.8). Total num frames: 462848. Throughput: 0: 753.1. Samples: 115790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:30,069][00281] Avg episode reward: [(0, '3.860')]
[2024-11-07 17:55:35,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2904.4). Total num frames: 479232. Throughput: 0: 715.2. Samples: 120072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:35,069][00281] Avg episode reward: [(0, '3.682')]
[2024-11-07 17:55:37,807][03016] Updated weights for policy 0, policy_version 120 (0.0028)
[2024-11-07 17:55:40,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2939.5). Total num frames: 499712. Throughput: 0: 724.3. Samples: 123032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:40,074][00281] Avg episode reward: [(0, '3.811')]
[2024-11-07 17:55:45,068][00281] Fps is (10 sec: 3276.2, 60 sec: 3071.9, 300 sec: 2925.7). Total num frames: 512000. Throughput: 0: 768.6. Samples: 128472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:45,071][00281] Avg episode reward: [(0, '3.994')]
[2024-11-07 17:55:50,067][00281] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2912.7). Total num frames: 524288. Throughput: 0: 774.8. Samples: 131994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:55:50,074][00281] Avg episode reward: [(0, '4.055')]
[2024-11-07 17:55:51,281][03016] Updated weights for policy 0, policy_version 130 (0.0019)
[2024-11-07 17:55:55,067][00281] Fps is (10 sec: 3277.4, 60 sec: 3004.0, 300 sec: 2944.7). Total num frames: 544768. Throughput: 0: 784.4. Samples: 135000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 17:55:55,069][00281] Avg episode reward: [(0, '4.020')]
[2024-11-07 17:56:00,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2975.0). Total num frames: 565248. Throughput: 0: 833.3. Samples: 140908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:56:00,072][00281] Avg episode reward: [(0, '3.830')]
[2024-11-07 17:56:03,132][03016] Updated weights for policy 0, policy_version 140 (0.0025)
[2024-11-07 17:56:05,075][00281] Fps is (10 sec: 3274.1, 60 sec: 3208.1, 300 sec: 2961.6). Total num frames: 577536. Throughput: 0: 805.8. Samples: 144706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:56:05,080][00281] Avg episode reward: [(0, '3.341')]
[2024-11-07 17:56:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2969.6). Total num frames: 593920. Throughput: 0: 794.4. Samples: 146956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:56:10,074][00281] Avg episode reward: [(0, '3.895')]
[2024-11-07 17:56:14,944][03016] Updated weights for policy 0, policy_version 150 (0.0028)
[2024-11-07 17:56:15,067][00281] Fps is (10 sec: 3689.4, 60 sec: 3345.1, 300 sec: 2997.1). Total num frames: 614400. Throughput: 0: 824.1. Samples: 152876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:56:15,074][00281] Avg episode reward: [(0, '3.932')]
[2024-11-07 17:56:20,075][00281] Fps is (10 sec: 3274.0, 60 sec: 3276.3, 300 sec: 2984.1). Total num frames: 626688. Throughput: 0: 828.3. Samples: 157352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:56:20,082][00281] Avg episode reward: [(0, '3.808')]
[2024-11-07 17:56:20,092][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth...
[2024-11-07 17:56:25,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 2972.0). Total num frames: 638976. Throughput: 0: 798.4. Samples: 158960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:56:25,074][00281] Avg episode reward: [(0, '3.827')]
[2024-11-07 17:56:28,261][03016] Updated weights for policy 0, policy_version 160 (0.0033)
[2024-11-07 17:56:30,067][00281] Fps is (10 sec: 3279.6, 60 sec: 3276.8, 300 sec: 2997.5). Total num frames: 659456. Throughput: 0: 802.8. Samples: 164598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:56:30,068][00281] Avg episode reward: [(0, '4.118')]
[2024-11-07 17:56:35,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3003.7). Total num frames: 675840. Throughput: 0: 845.8. Samples: 170054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:56:35,073][00281] Avg episode reward: [(0, '4.413')]
[2024-11-07 17:56:35,078][03002] Saving new best policy, reward=4.413!
[2024-11-07 17:56:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2991.9). Total num frames: 688128. Throughput: 0: 814.7. Samples: 171662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:56:40,069][00281] Avg episode reward: [(0, '4.091')]
[2024-11-07 17:56:41,734][03016] Updated weights for policy 0, policy_version 170 (0.0032)
[2024-11-07 17:56:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3015.4). Total num frames: 708608. Throughput: 0: 792.0. Samples: 176548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:56:45,075][00281] Avg episode reward: [(0, '4.218')]
[2024-11-07 17:56:50,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3037.9). Total num frames: 729088. Throughput: 0: 836.0. Samples: 182318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:56:50,069][00281] Avg episode reward: [(0, '3.906')]
[2024-11-07 17:56:53,350][03016] Updated weights for policy 0, policy_version 180 (0.0031)
[2024-11-07 17:56:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3009.3). Total num frames: 737280. Throughput: 0: 831.1. Samples: 184354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:56:55,077][00281] Avg episode reward: [(0, '4.618')]
[2024-11-07 17:56:55,085][03002] Saving new best policy, reward=4.618!
[2024-11-07 17:57:00,067][00281] Fps is (10 sec: 2457.5, 60 sec: 3140.3, 300 sec: 3014.7). Total num frames: 753664. Throughput: 0: 787.1. Samples: 188296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:57:00,070][00281] Avg episode reward: [(0, '4.551')]
[2024-11-07 17:57:05,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3277.2, 300 sec: 3035.9). Total num frames: 774144. Throughput: 0: 818.7. Samples: 194188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:57:05,072][00281] Avg episode reward: [(0, '4.727')]
[2024-11-07 17:57:05,077][03002] Saving new best policy, reward=4.727!
[2024-11-07 17:57:05,602][03016] Updated weights for policy 0, policy_version 190 (0.0014)
[2024-11-07 17:57:10,067][00281] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3040.5). Total num frames: 790528. Throughput: 0: 843.2. Samples: 196904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:57:10,069][00281] Avg episode reward: [(0, '5.531')]
[2024-11-07 17:57:10,085][03002] Saving new best policy, reward=5.531!
[2024-11-07 17:57:15,070][00281] Fps is (10 sec: 2866.3, 60 sec: 3140.1, 300 sec: 3029.5). Total num frames: 802816. Throughput: 0: 792.3. Samples: 200256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:57:15,078][00281] Avg episode reward: [(0, '4.646')]
[2024-11-07 17:57:18,877][03016] Updated weights for policy 0, policy_version 200 (0.0024)
[2024-11-07 17:57:20,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3277.3, 300 sec: 3049.2). Total num frames: 823296. Throughput: 0: 799.5. Samples: 206032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:57:20,069][00281] Avg episode reward: [(0, '4.570')]
[2024-11-07 17:57:25,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3345.1, 300 sec: 3053.4). Total num frames: 839680. Throughput: 0: 831.0. Samples: 209058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:57:25,075][00281] Avg episode reward: [(0, '4.460')]
[2024-11-07 17:57:30,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3042.7). Total num frames: 851968. Throughput: 0: 813.6. Samples: 213162. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 17:57:30,071][00281] Avg episode reward: [(0, '4.797')]
[2024-11-07 17:57:31,955][03016] Updated weights for policy 0, policy_version 210 (0.0025)
[2024-11-07 17:57:35,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3061.2). Total num frames: 872448. Throughput: 0: 798.0. Samples: 218226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:57:35,069][00281] Avg episode reward: [(0, '4.835')]
[2024-11-07 17:57:40,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3079.1). Total num frames: 892928. Throughput: 0: 820.5. Samples: 221278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:57:40,073][00281] Avg episode reward: [(0, '4.581')]
[2024-11-07 17:57:42,943][03016] Updated weights for policy 0, policy_version 220 (0.0016)
[2024-11-07 17:57:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3068.5). Total num frames: 905216. Throughput: 0: 841.2. Samples: 226150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:57:45,068][00281] Avg episode reward: [(0, '4.272')]
[2024-11-07 17:57:50,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 917504. Throughput: 0: 798.3. Samples: 230110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:57:50,070][00281] Avg episode reward: [(0, '5.067')]
[2024-11-07 17:57:55,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3179.6). Total num frames: 937984. Throughput: 0: 803.3. Samples: 233054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:57:55,070][00281] Avg episode reward: [(0, '4.756')]
[2024-11-07 17:57:55,649][03016] Updated weights for policy 0, policy_version 230 (0.0015)
[2024-11-07 17:58:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 954368. Throughput: 0: 854.9. Samples: 238724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:58:00,073][00281] Avg episode reward: [(0, '5.091')]
[2024-11-07 17:58:05,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 966656. Throughput: 0: 803.2. Samples: 242176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 17:58:05,075][00281] Avg episode reward: [(0, '5.093')]
[2024-11-07 17:58:08,693][03016] Updated weights for policy 0, policy_version 240 (0.0018)
[2024-11-07 17:58:10,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 987136. Throughput: 0: 800.5. Samples: 245080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 17:58:10,074][00281] Avg episode reward: [(0, '5.170')]
[2024-11-07 17:58:15,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3413.5, 300 sec: 3221.3). Total num frames: 1007616. Throughput: 0: 841.7. Samples: 251038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:58:15,068][00281] Avg episode reward: [(0, '5.191')]
[2024-11-07 17:58:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1015808. Throughput: 0: 815.4. Samples: 254920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:58:20,072][00281] Avg episode reward: [(0, '5.476')]
[2024-11-07 17:58:20,089][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000248_1015808.pth...
[2024-11-07 17:58:20,262][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000062_253952.pth
[2024-11-07 17:58:22,233][03016] Updated weights for policy 0, policy_version 250 (0.0030)
[2024-11-07 17:58:25,067][00281] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1032192. Throughput: 0: 791.1. Samples: 256878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:58:25,073][00281] Avg episode reward: [(0, '5.026')]
[2024-11-07 17:58:30,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1052672. Throughput: 0: 817.2. Samples: 262924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 17:58:30,073][00281] Avg episode reward: [(0, '5.508')]
[2024-11-07 17:58:32,841][03016] Updated weights for policy 0, policy_version 260 (0.0025)
[2024-11-07 17:58:35,067][00281] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 1069056. Throughput: 0: 835.4. Samples: 267702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:58:35,070][00281] Avg episode reward: [(0, '4.455')]
[2024-11-07 17:58:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3179.6). Total num frames: 1081344. Throughput: 0: 806.0. Samples: 269326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:58:40,070][00281] Avg episode reward: [(0, '4.395')]
[2024-11-07 17:58:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1101824. Throughput: 0: 801.9. Samples: 274810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:58:45,069][00281] Avg episode reward: [(0, '4.989')]
[2024-11-07 17:58:45,787][03016] Updated weights for policy 0, policy_version 270 (0.0024)
[2024-11-07 17:58:50,073][00281] Fps is (10 sec: 3684.1, 60 sec: 3344.7, 300 sec: 3207.3). Total num frames: 1118208. Throughput: 0: 848.7. Samples: 280374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:58:50,077][00281] Avg episode reward: [(0, '5.600')]
[2024-11-07 17:58:50,102][03002] Saving new best policy, reward=5.600!
[2024-11-07 17:58:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1130496. Throughput: 0: 822.0. Samples: 282072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:58:55,069][00281] Avg episode reward: [(0, '5.806')]
[2024-11-07 17:58:55,073][03002] Saving new best policy, reward=5.806!
[2024-11-07 17:58:59,074][03016] Updated weights for policy 0, policy_version 280 (0.0022)
[2024-11-07 17:59:00,070][00281] Fps is (10 sec: 2868.1, 60 sec: 3208.4, 300 sec: 3193.5). Total num frames: 1146880. Throughput: 0: 796.0. Samples: 286862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:59:00,079][00281] Avg episode reward: [(0, '5.671')]
[2024-11-07 17:59:05,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1167360. Throughput: 0: 844.8. Samples: 292936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:59:05,068][00281] Avg episode reward: [(0, '5.263')]
[2024-11-07 17:59:10,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1183744. Throughput: 0: 848.7. Samples: 295070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:59:10,068][00281] Avg episode reward: [(0, '5.343')]
[2024-11-07 17:59:11,602][03016] Updated weights for policy 0, policy_version 290 (0.0037)
[2024-11-07 17:59:15,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1200128. Throughput: 0: 804.9. Samples: 299146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:59:15,072][00281] Avg episode reward: [(0, '6.132')]
[2024-11-07 17:59:15,075][03002] Saving new best policy, reward=6.132!
[2024-11-07 17:59:20,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 1216512. Throughput: 0: 829.6. Samples: 305036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 17:59:20,072][00281] Avg episode reward: [(0, '5.342')]
[2024-11-07 17:59:22,111][03016] Updated weights for policy 0, policy_version 300 (0.0023)
[2024-11-07 17:59:25,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1232896. Throughput: 0: 859.0. Samples: 307980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 17:59:25,075][00281] Avg episode reward: [(0, '5.704')]
[2024-11-07 17:59:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 1245184. Throughput: 0: 814.8. Samples: 311474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:59:30,073][00281] Avg episode reward: [(0, '5.726')]
[2024-11-07 17:59:35,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1265664. Throughput: 0: 813.0. Samples: 316952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 17:59:35,069][00281] Avg episode reward: [(0, '5.294')]
[2024-11-07 17:59:35,577][03016] Updated weights for policy 0, policy_version 310 (0.0022)
[2024-11-07 17:59:40,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 1286144. Throughput: 0: 842.2. Samples: 319970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:59:40,071][00281] Avg episode reward: [(0, '5.036')]
[2024-11-07 17:59:45,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1294336. Throughput: 0: 827.8. Samples: 324110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:59:45,072][00281] Avg episode reward: [(0, '5.421')]
[2024-11-07 17:59:48,834][03016] Updated weights for policy 0, policy_version 320 (0.0022)
[2024-11-07 17:59:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3277.1, 300 sec: 3221.3). Total num frames: 1314816. Throughput: 0: 798.4. Samples: 328864. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 17:59:50,073][00281] Avg episode reward: [(0, '4.972')]
[2024-11-07 17:59:55,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 1335296. Throughput: 0: 818.4. Samples: 331896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 17:59:55,069][00281] Avg episode reward: [(0, '5.902')]
[2024-11-07 18:00:00,067][00281] Fps is (10 sec: 3276.7, 60 sec: 3345.2, 300 sec: 3262.9). Total num frames: 1347584. Throughput: 0: 839.2. Samples: 336912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:00:00,069][00281] Avg episode reward: [(0, '5.596')]
[2024-11-07 18:00:01,260][03016] Updated weights for policy 0, policy_version 330 (0.0021)
[2024-11-07 18:00:05,067][00281] Fps is (10 sec: 2867.0, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1363968. Throughput: 0: 800.7. Samples: 341066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:00:05,070][00281] Avg episode reward: [(0, '5.846')]
[2024-11-07 18:00:10,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1384448. Throughput: 0: 803.5. Samples: 344138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:00:10,069][00281] Avg episode reward: [(0, '6.473')]
[2024-11-07 18:00:10,077][03002] Saving new best policy, reward=6.473!
[2024-11-07 18:00:12,262][03016] Updated weights for policy 0, policy_version 340 (0.0014)
[2024-11-07 18:00:15,067][00281] Fps is (10 sec: 3686.7, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1400832. Throughput: 0: 853.1. Samples: 349862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:00:15,069][00281] Avg episode reward: [(0, '5.302')]
[2024-11-07 18:00:20,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1409024. Throughput: 0: 809.5. Samples: 353378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:00:20,069][00281] Avg episode reward: [(0, '6.516')]
[2024-11-07 18:00:20,148][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_1413120.pth...
[2024-11-07 18:00:20,288][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth
[2024-11-07 18:00:20,303][03002] Saving new best policy, reward=6.516!
[2024-11-07 18:00:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1429504. Throughput: 0: 798.6. Samples: 355908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:00:25,076][00281] Avg episode reward: [(0, '6.422')]
[2024-11-07 18:00:25,549][03016] Updated weights for policy 0, policy_version 350 (0.0027)
[2024-11-07 18:00:30,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1449984. Throughput: 0: 842.4. Samples: 362018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:00:30,069][00281] Avg episode reward: [(0, '6.034')]
[2024-11-07 18:00:35,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1462272. Throughput: 0: 833.6. Samples: 366376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:00:35,072][00281] Avg episode reward: [(0, '5.829')]
[2024-11-07 18:00:38,566][03016] Updated weights for policy 0, policy_version 360 (0.0050)
[2024-11-07 18:00:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 1478656. Throughput: 0: 808.4. Samples: 368276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:00:40,072][00281] Avg episode reward: [(0, '6.529')]
[2024-11-07 18:00:40,084][03002] Saving new best policy, reward=6.529!
[2024-11-07 18:00:45,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1499136. Throughput: 0: 828.3. Samples: 374186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:00:45,074][00281] Avg episode reward: [(0, '6.295')]
[2024-11-07 18:00:49,816][03016] Updated weights for policy 0, policy_version 370 (0.0027)
[2024-11-07 18:00:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1515520. Throughput: 0: 847.1. Samples: 379186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:00:50,073][00281] Avg episode reward: [(0, '6.399')]
[2024-11-07 18:00:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1527808. Throughput: 0: 816.7. Samples: 380890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 18:00:55,074][00281] Avg episode reward: [(0, '6.051')]
[2024-11-07 18:01:00,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.8). Total num frames: 1548288. Throughput: 0: 807.6. Samples: 386206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:01:00,072][00281] Avg episode reward: [(0, '6.330')]
[2024-11-07 18:01:02,055][03016] Updated weights for policy 0, policy_version 380 (0.0027)
[2024-11-07 18:01:05,070][00281] Fps is (10 sec: 3685.2, 60 sec: 3344.9, 300 sec: 3290.7). Total num frames: 1564672. Throughput: 0: 859.9. Samples: 392074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:01:05,072][00281] Avg episode reward: [(0, '6.786')]
[2024-11-07 18:01:05,080][03002] Saving new best policy, reward=6.786!
[2024-11-07 18:01:10,070][00281] Fps is (10 sec: 2866.4, 60 sec: 3208.4, 300 sec: 3262.9). Total num frames: 1576960. Throughput: 0: 839.2. Samples: 393676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:01:10,072][00281] Avg episode reward: [(0, '6.082')]
[2024-11-07 18:01:15,067][00281] Fps is (10 sec: 2868.0, 60 sec: 3208.5, 300 sec: 3276.9). Total num frames: 1593344. Throughput: 0: 802.9. Samples: 398150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:01:15,072][00281] Avg episode reward: [(0, '7.565')]
[2024-11-07 18:01:15,077][03002] Saving new best policy, reward=7.565!
[2024-11-07 18:01:15,369][03016] Updated weights for policy 0, policy_version 390 (0.0027)
[2024-11-07 18:01:20,067][00281] Fps is (10 sec: 3687.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1613824. Throughput: 0: 836.1. Samples: 404002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:01:20,074][00281] Avg episode reward: [(0, '5.206')]
[2024-11-07 18:01:25,069][00281] Fps is (10 sec: 3276.3, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 1626112. Throughput: 0: 840.8. Samples: 406114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:01:25,074][00281] Avg episode reward: [(0, '6.255')]
[2024-11-07 18:01:30,070][00281] Fps is (10 sec: 2047.3, 60 sec: 3071.8, 300 sec: 3249.0). Total num frames: 1634304. Throughput: 0: 773.0. Samples: 408974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 18:01:30,075][00281] Avg episode reward: [(0, '6.501')]
[2024-11-07 18:01:31,002][03016] Updated weights for policy 0, policy_version 400 (0.0055)
[2024-11-07 18:01:35,067][00281] Fps is (10 sec: 2458.0, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 1650688. Throughput: 0: 757.9. Samples: 413290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:01:35,074][00281] Avg episode reward: [(0, '6.530')]
[2024-11-07 18:01:40,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1671168. Throughput: 0: 786.5. Samples: 416284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:01:40,073][00281] Avg episode reward: [(0, '6.444')]
[2024-11-07 18:01:41,782][03016] Updated weights for policy 0, policy_version 410 (0.0014)
[2024-11-07 18:01:45,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 1683456. Throughput: 0: 781.2. Samples: 421362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:01:45,073][00281] Avg episode reward: [(0, '6.499')]
[2024-11-07 18:01:50,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3072.0, 300 sec: 3262.9). Total num frames: 1699840. Throughput: 0: 739.6. Samples: 425354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:01:50,069][00281] Avg episode reward: [(0, '6.315')]
[2024-11-07 18:01:54,943][03016] Updated weights for policy 0, policy_version 420 (0.0022)
[2024-11-07 18:01:55,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 1720320. Throughput: 0: 771.2. Samples: 428380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:01:55,074][00281] Avg episode reward: [(0, '5.734')]
[2024-11-07 18:02:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3140.2, 300 sec: 3262.9). Total num frames: 1736704. Throughput: 0: 800.4. Samples: 434168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:02:00,075][00281] Avg episode reward: [(0, '6.634')]
[2024-11-07 18:02:05,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.2, 300 sec: 3249.0). Total num frames: 1748992. Throughput: 0: 748.9. Samples: 437704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:02:05,069][00281] Avg episode reward: [(0, '5.629')]
[2024-11-07 18:02:07,812][03016] Updated weights for policy 0, policy_version 430 (0.0038)
[2024-11-07 18:02:10,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3208.7, 300 sec: 3276.8). Total num frames: 1769472. Throughput: 0: 767.5. Samples: 440652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:02:10,074][00281] Avg episode reward: [(0, '5.546')]
[2024-11-07 18:02:15,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1789952. Throughput: 0: 842.5. Samples: 446886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:02:15,073][00281] Avg episode reward: [(0, '6.760')]
[2024-11-07 18:02:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 1798144. Throughput: 0: 835.7. Samples: 450898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:02:20,070][00281] Avg episode reward: [(0, '6.148')]
[2024-11-07 18:02:20,091][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth...
[2024-11-07 18:02:20,103][03016] Updated weights for policy 0, policy_version 440 (0.0051)
[2024-11-07 18:02:20,299][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000248_1015808.pth
[2024-11-07 18:02:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3276.8). Total num frames: 1818624. Throughput: 0: 815.4. Samples: 452976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:02:25,069][00281] Avg episode reward: [(0, '5.750')]
[2024-11-07 18:02:30,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3276.8). Total num frames: 1839104. Throughput: 0: 838.4. Samples: 459088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:02:30,069][00281] Avg episode reward: [(0, '6.709')]
[2024-11-07 18:02:31,011][03016] Updated weights for policy 0, policy_version 450 (0.0023)
[2024-11-07 18:02:35,068][00281] Fps is (10 sec: 3276.5, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 1851392. Throughput: 0: 858.0. Samples: 463966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:02:35,070][00281] Avg episode reward: [(0, '6.694')]
[2024-11-07 18:02:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1867776. Throughput: 0: 829.7. Samples: 465716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:02:40,073][00281] Avg episode reward: [(0, '6.557')]
[2024-11-07 18:02:43,662][03016] Updated weights for policy 0, policy_version 460 (0.0022)
[2024-11-07 18:02:45,067][00281] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1888256. Throughput: 0: 829.0. Samples: 471474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:02:45,076][00281] Avg episode reward: [(0, '6.435')]
[2024-11-07 18:02:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 1904640. Throughput: 0: 878.0. Samples: 477214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:02:50,073][00281] Avg episode reward: [(0, '6.625')]
[2024-11-07 18:02:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1916928. Throughput: 0: 850.3. Samples: 478914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:02:55,072][00281] Avg episode reward: [(0, '6.462')]
[2024-11-07 18:02:56,803][03016] Updated weights for policy 0, policy_version 470 (0.0031)
[2024-11-07 18:03:00,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1937408. Throughput: 0: 820.0. Samples: 483784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:03:00,073][00281] Avg episode reward: [(0, '7.865')]
[2024-11-07 18:03:00,084][03002] Saving new best policy, reward=7.865!
[2024-11-07 18:03:05,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 1957888. Throughput: 0: 864.4. Samples: 489796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:03:05,074][00281] Avg episode reward: [(0, '6.252')]
[2024-11-07 18:03:07,493][03016] Updated weights for policy 0, policy_version 480 (0.0035)
[2024-11-07 18:03:10,068][00281] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3262.9). Total num frames: 1970176. Throughput: 0: 867.6. Samples: 492018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:03:10,071][00281] Avg episode reward: [(0, '6.927')]
[2024-11-07 18:03:15,068][00281] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 1986560. Throughput: 0: 821.4. Samples: 496054. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 18:03:15,073][00281] Avg episode reward: [(0, '6.090')]
[2024-11-07 18:03:19,704][03016] Updated weights for policy 0, policy_version 490 (0.0016)
[2024-11-07 18:03:20,067][00281] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 2007040. Throughput: 0: 848.9. Samples: 502168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:03:20,069][00281] Avg episode reward: [(0, '6.491')]
[2024-11-07 18:03:25,067][00281] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2023424. Throughput: 0: 878.3. Samples: 505240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:03:25,070][00281] Avg episode reward: [(0, '6.935')]
[2024-11-07 18:03:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2035712. Throughput: 0: 830.1. Samples: 508828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:03:30,076][00281] Avg episode reward: [(0, '5.571')]
[2024-11-07 18:03:32,386][03016] Updated weights for policy 0, policy_version 500 (0.0026)
[2024-11-07 18:03:35,067][00281] Fps is (10 sec: 3277.0, 60 sec: 3413.4, 300 sec: 3304.6). Total num frames: 2056192. Throughput: 0: 832.1. Samples: 514660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:03:35,074][00281] Avg episode reward: [(0, '5.801')]
[2024-11-07 18:03:40,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 2076672. Throughput: 0: 863.3. Samples: 517762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:03:40,073][00281] Avg episode reward: [(0, '6.208')]
[2024-11-07 18:03:45,021][03016] Updated weights for policy 0, policy_version 510 (0.0017)
[2024-11-07 18:03:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.8). Total num frames: 2088960. Throughput: 0: 849.5. Samples: 522010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:03:45,075][00281] Avg episode reward: [(0, '5.659')]
[2024-11-07 18:03:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2105344. Throughput: 0: 827.9. Samples: 527050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-11-07 18:03:50,069][00281] Avg episode reward: [(0, '5.965')]
[2024-11-07 18:03:55,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3318.5). Total num frames: 2125824. Throughput: 0: 845.1. Samples: 530044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:03:55,069][00281] Avg episode reward: [(0, '6.629')]
[2024-11-07 18:03:55,495][03016] Updated weights for policy 0, policy_version 520 (0.0030)
[2024-11-07 18:04:00,070][00281] Fps is (10 sec: 3275.7, 60 sec: 3344.9, 300 sec: 3290.6). Total num frames: 2138112. Throughput: 0: 870.2. Samples: 535214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:04:00,074][00281] Avg episode reward: [(0, '7.086')]
[2024-11-07 18:04:05,068][00281] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 2154496. Throughput: 0: 829.2. Samples: 539484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:04:05,073][00281] Avg episode reward: [(0, '6.734')]
[2024-11-07 18:04:08,428][03016] Updated weights for policy 0, policy_version 530 (0.0033)
[2024-11-07 18:04:10,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3413.4, 300 sec: 3304.6). Total num frames: 2174976. Throughput: 0: 829.7. Samples: 542574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-11-07 18:04:10,071][00281] Avg episode reward: [(0, '8.119')]
[2024-11-07 18:04:10,079][03002] Saving new best policy, reward=8.119!
[2024-11-07 18:04:15,070][00281] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3304.5). Total num frames: 2191360. Throughput: 0: 874.7. Samples: 548192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:04:15,072][00281] Avg episode reward: [(0, '6.317')]
[2024-11-07 18:04:20,068][00281] Fps is (10 sec: 2866.8, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 2203648. Throughput: 0: 818.6. Samples: 551496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:04:20,071][00281] Avg episode reward: [(0, '7.052')]
[2024-11-07 18:04:20,080][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000538_2203648.pth...
[2024-11-07 18:04:20,227][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_1413120.pth
[2024-11-07 18:04:21,856][03016] Updated weights for policy 0, policy_version 540 (0.0037)
[2024-11-07 18:04:25,067][00281] Fps is (10 sec: 2868.1, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 2220032. Throughput: 0: 809.4. Samples: 554184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:04:25,073][00281] Avg episode reward: [(0, '5.885')]
[2024-11-07 18:04:30,067][00281] Fps is (10 sec: 3686.9, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 2240512. Throughput: 0: 840.8. Samples: 559848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:04:30,071][00281] Avg episode reward: [(0, '6.998')]
[2024-11-07 18:04:34,012][03016] Updated weights for policy 0, policy_version 550 (0.0020)
[2024-11-07 18:04:35,069][00281] Fps is (10 sec: 3276.1, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 2252800. Throughput: 0: 821.2. Samples: 564006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:04:35,073][00281] Avg episode reward: [(0, '6.398')]
[2024-11-07 18:04:40,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2269184. Throughput: 0: 799.5. Samples: 566020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 18:04:40,074][00281] Avg episode reward: [(0, '5.966')]
[2024-11-07 18:04:45,067][00281] Fps is (10 sec: 3687.1, 60 sec: 3345.0, 300 sec: 3304.6). Total num frames: 2289664. Throughput: 0: 810.4. Samples: 571680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:04:45,074][00281] Avg episode reward: [(0, '7.404')]
[2024-11-07 18:04:45,816][03016] Updated weights for policy 0, policy_version 560 (0.0019)
[2024-11-07 18:04:50,068][00281] Fps is (10 sec: 3276.6, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 2301952. Throughput: 0: 827.2. Samples: 576708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:04:50,074][00281] Avg episode reward: [(0, '6.343')]
[2024-11-07 18:04:55,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2318336. Throughput: 0: 797.2. Samples: 578446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:04:55,071][00281] Avg episode reward: [(0, '6.418')]
[2024-11-07 18:04:58,775][03016] Updated weights for policy 0, policy_version 570 (0.0018)
[2024-11-07 18:05:00,067][00281] Fps is (10 sec: 3686.8, 60 sec: 3345.2, 300 sec: 3304.6). Total num frames: 2338816. Throughput: 0: 795.7. Samples: 583998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:00,074][00281] Avg episode reward: [(0, '6.027')]
[2024-11-07 18:05:05,067][00281] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2355200. Throughput: 0: 849.4. Samples: 589718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:05,071][00281] Avg episode reward: [(0, '6.631')]
[2024-11-07 18:05:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2367488. Throughput: 0: 826.7. Samples: 591384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:10,068][00281] Avg episode reward: [(0, '6.380')]
[2024-11-07 18:05:12,124][03016] Updated weights for policy 0, policy_version 580 (0.0033)
[2024-11-07 18:05:15,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3208.7, 300 sec: 3304.6). Total num frames: 2383872. Throughput: 0: 802.0. Samples: 595936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:15,074][00281] Avg episode reward: [(0, '5.926')]
[2024-11-07 18:05:20,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2404352. Throughput: 0: 842.0. Samples: 601894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:05:20,073][00281] Avg episode reward: [(0, '5.925')]
[2024-11-07 18:05:23,956][03016] Updated weights for policy 0, policy_version 590 (0.0030)
[2024-11-07 18:05:25,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2416640. Throughput: 0: 843.2. Samples: 603964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:25,072][00281] Avg episode reward: [(0, '5.737')]
[2024-11-07 18:05:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2433024. Throughput: 0: 801.5. Samples: 607746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:30,072][00281] Avg episode reward: [(0, '6.238')]
[2024-11-07 18:05:35,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3304.6). Total num frames: 2453504. Throughput: 0: 823.0. Samples: 613744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:35,075][00281] Avg episode reward: [(0, '5.967')]
[2024-11-07 18:05:35,895][03016] Updated weights for policy 0, policy_version 600 (0.0037)
[2024-11-07 18:05:40,069][00281] Fps is (10 sec: 3685.5, 60 sec: 3345.0, 300 sec: 3290.7). Total num frames: 2469888. Throughput: 0: 850.0. Samples: 616696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:05:40,071][00281] Avg episode reward: [(0, '7.711')]
[2024-11-07 18:05:45,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2482176. Throughput: 0: 806.0. Samples: 620268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:05:45,073][00281] Avg episode reward: [(0, '6.727')]
[2024-11-07 18:05:49,055][03016] Updated weights for policy 0, policy_version 610 (0.0025)
[2024-11-07 18:05:50,067][00281] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 2498560. Throughput: 0: 799.4. Samples: 625690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:05:50,069][00281] Avg episode reward: [(0, '7.309')]
[2024-11-07 18:05:55,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2519040. Throughput: 0: 828.2. Samples: 628652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:05:55,069][00281] Avg episode reward: [(0, '7.492')]
[2024-11-07 18:06:00,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2531328. Throughput: 0: 829.7. Samples: 633272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:06:00,071][00281] Avg episode reward: [(0, '7.174')]
[2024-11-07 18:06:02,106][03016] Updated weights for policy 0, policy_version 620 (0.0032)
[2024-11-07 18:06:05,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2547712. Throughput: 0: 802.3. Samples: 637998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:06:05,073][00281] Avg episode reward: [(0, '8.016')]
[2024-11-07 18:06:10,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 2572288. Throughput: 0: 825.9. Samples: 641128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:06:10,072][00281] Avg episode reward: [(0, '6.513')]
[2024-11-07 18:06:12,351][03016] Updated weights for policy 0, policy_version 630 (0.0026)
[2024-11-07 18:06:15,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2584576. Throughput: 0: 859.3. Samples: 646414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:06:15,073][00281] Avg episode reward: [(0, '6.580')]
[2024-11-07 18:06:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 2600960. Throughput: 0: 815.9. Samples: 650460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:06:20,074][00281] Avg episode reward: [(0, '6.416')]
[2024-11-07 18:06:20,090][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth...
[2024-11-07 18:06:20,215][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth
[2024-11-07 18:06:24,993][03016] Updated weights for policy 0, policy_version 640 (0.0031)
[2024-11-07 18:06:25,067][00281] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3346.3). Total num frames: 2621440. Throughput: 0: 814.7. Samples: 653358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-11-07 18:06:25,075][00281] Avg episode reward: [(0, '8.118')]
[2024-11-07 18:06:30,068][00281] Fps is (10 sec: 3686.1, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 2637824. Throughput: 0: 870.0. Samples: 659418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:06:30,070][00281] Avg episode reward: [(0, '6.139')]
[2024-11-07 18:06:35,067][00281] Fps is (10 sec: 2457.7, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2646016. Throughput: 0: 824.4. Samples: 662790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:06:35,073][00281] Avg episode reward: [(0, '6.812')]
[2024-11-07 18:06:38,495][03016] Updated weights for policy 0, policy_version 650 (0.0047)
[2024-11-07 18:06:40,067][00281] Fps is (10 sec: 2867.4, 60 sec: 3276.9, 300 sec: 3332.3). Total num frames: 2666496. Throughput: 0: 813.3. Samples: 665250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:06:40,074][00281] Avg episode reward: [(0, '6.825')]
[2024-11-07 18:06:45,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 2686976. Throughput: 0: 844.2. Samples: 671260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:06:45,069][00281] Avg episode reward: [(0, '6.721')]
[2024-11-07 18:06:50,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 2699264. Throughput: 0: 833.4. Samples: 675500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:06:50,070][00281] Avg episode reward: [(0, '6.722')]
[2024-11-07 18:06:51,185][03016] Updated weights for policy 0, policy_version 660 (0.0030)
[2024-11-07 18:06:55,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2711552. Throughput: 0: 800.8. Samples: 677162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:06:55,073][00281] Avg episode reward: [(0, '6.769')]
[2024-11-07 18:07:00,069][00281] Fps is (10 sec: 3276.2, 60 sec: 3345.0, 300 sec: 3332.3). Total num frames: 2732032. Throughput: 0: 813.9. Samples: 683040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:07:00,071][00281] Avg episode reward: [(0, '6.534')]
[2024-11-07 18:07:02,146][03016] Updated weights for policy 0, policy_version 670 (0.0014)
[2024-11-07 18:07:05,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 2752512. Throughput: 0: 845.2. Samples: 688492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:07:05,069][00281] Avg episode reward: [(0, '7.067')]
[2024-11-07 18:07:10,067][00281] Fps is (10 sec: 2867.7, 60 sec: 3140.3, 300 sec: 3290.7). Total num frames: 2760704. Throughput: 0: 818.4. Samples: 690184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:07:10,071][00281] Avg episode reward: [(0, '7.947')]
[2024-11-07 18:07:15,029][03016] Updated weights for policy 0, policy_version 680 (0.0030)
[2024-11-07 18:07:15,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 2785280. Throughput: 0: 798.3. Samples: 695340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:07:15,074][00281] Avg episode reward: [(0, '6.146')]
[2024-11-07 18:07:20,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 2801664. Throughput: 0: 859.8. Samples: 701482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:07:20,069][00281] Avg episode reward: [(0, '6.033')]
[2024-11-07 18:07:25,067][00281] Fps is (10 sec: 2867.0, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2813952. Throughput: 0: 846.4. Samples: 703338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:07:25,072][00281] Avg episode reward: [(0, '7.557')]
[2024-11-07 18:07:27,978][03016] Updated weights for policy 0, policy_version 690 (0.0027)
[2024-11-07 18:07:30,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2834432. Throughput: 0: 810.9. Samples: 707750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:07:30,074][00281] Avg episode reward: [(0, '8.034')]
[2024-11-07 18:07:35,067][00281] Fps is (10 sec: 4096.3, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 2854912. Throughput: 0: 851.9. Samples: 713834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:07:35,069][00281] Avg episode reward: [(0, '7.520')]
[2024-11-07 18:07:38,572][03016] Updated weights for policy 0, policy_version 700 (0.0030)
[2024-11-07 18:07:40,067][00281] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 3318.4). Total num frames: 2867200. Throughput: 0: 878.3. Samples: 716686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 18:07:40,070][00281] Avg episode reward: [(0, '7.541')]
[2024-11-07 18:07:45,068][00281] Fps is (10 sec: 2457.3, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2879488. Throughput: 0: 822.2. Samples: 720038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:07:45,076][00281] Avg episode reward: [(0, '8.198')]
[2024-11-07 18:07:45,078][03002] Saving new best policy, reward=8.198!
[2024-11-07 18:07:50,067][00281] Fps is (10 sec: 2457.8, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2891776. Throughput: 0: 778.1. Samples: 723506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:07:50,073][00281] Avg episode reward: [(0, '7.468')]
[2024-11-07 18:07:53,789][03016] Updated weights for policy 0, policy_version 710 (0.0025)
[2024-11-07 18:07:55,067][00281] Fps is (10 sec: 3277.2, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2912256. Throughput: 0: 805.4. Samples: 726428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:07:55,073][00281] Avg episode reward: [(0, '7.355')]
[2024-11-07 18:08:00,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3140.4, 300 sec: 3262.9). Total num frames: 2920448. Throughput: 0: 784.3. Samples: 730632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:08:00,075][00281] Avg episode reward: [(0, '8.277')]
[2024-11-07 18:08:00,090][03002] Saving new best policy, reward=8.277!
[2024-11-07 18:08:05,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 3290.7). Total num frames: 2940928. Throughput: 0: 762.0. Samples: 735774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:08:05,074][00281] Avg episode reward: [(0, '7.450')]
[2024-11-07 18:08:06,611][03016] Updated weights for policy 0, policy_version 720 (0.0035)
[2024-11-07 18:08:10,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2961408. Throughput: 0: 789.6. Samples: 738870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:08:10,072][00281] Avg episode reward: [(0, '7.903')]
[2024-11-07 18:08:15,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 2973696. Throughput: 0: 804.4. Samples: 743950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:08:15,069][00281] Avg episode reward: [(0, '7.661')]
[2024-11-07 18:08:19,203][03016] Updated weights for policy 0, policy_version 730 (0.0035)
[2024-11-07 18:08:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 2990080. Throughput: 0: 767.2. Samples: 748356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:08:20,069][00281] Avg episode reward: [(0, '7.802')]
[2024-11-07 18:08:20,080][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth...
[2024-11-07 18:08:20,231][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000538_2203648.pth
[2024-11-07 18:08:25,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3010560. Throughput: 0: 768.1. Samples: 751248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:08:25,074][00281] Avg episode reward: [(0, '8.856')]
[2024-11-07 18:08:25,076][03002] Saving new best policy, reward=8.856!
[2024-11-07 18:08:30,070][00281] Fps is (10 sec: 3685.1, 60 sec: 3208.3, 300 sec: 3290.6). Total num frames: 3026944. Throughput: 0: 822.9. Samples: 757072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:08:30,078][00281] Avg episode reward: [(0, '8.110')]
[2024-11-07 18:08:30,643][03016] Updated weights for policy 0, policy_version 740 (0.0027)
[2024-11-07 18:08:35,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3262.9). Total num frames: 3039232. Throughput: 0: 824.1. Samples: 760592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:08:35,069][00281] Avg episode reward: [(0, '9.144')]
[2024-11-07 18:08:35,076][03002] Saving new best policy, reward=9.144!
[2024-11-07 18:08:40,067][00281] Fps is (10 sec: 3277.9, 60 sec: 3208.6, 300 sec: 3290.7). Total num frames: 3059712. Throughput: 0: 825.9. Samples: 763594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:08:40,073][00281] Avg episode reward: [(0, '7.777')]
[2024-11-07 18:08:42,264][03016] Updated weights for policy 0, policy_version 750 (0.0030)
[2024-11-07 18:08:45,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3080192. Throughput: 0: 870.6. Samples: 769810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:08:45,072][00281] Avg episode reward: [(0, '7.894')]
[2024-11-07 18:08:50,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3092480. Throughput: 0: 846.2. Samples: 773852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:08:50,072][00281] Avg episode reward: [(0, '7.247')]
[2024-11-07 18:08:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 3108864. Throughput: 0: 827.6. Samples: 776114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:08:55,074][00281] Avg episode reward: [(0, '7.653')]
[2024-11-07 18:08:55,357][03016] Updated weights for policy 0, policy_version 760 (0.0019)
[2024-11-07 18:09:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 3129344. Throughput: 0: 843.1. Samples: 781888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:09:00,069][00281] Avg episode reward: [(0, '7.716')]
[2024-11-07 18:09:05,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3141632. Throughput: 0: 850.4. Samples: 786624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:09:05,075][00281] Avg episode reward: [(0, '9.175')]
[2024-11-07 18:09:05,077][03002] Saving new best policy, reward=9.175!
[2024-11-07 18:09:08,743][03016] Updated weights for policy 0, policy_version 770 (0.0026)
[2024-11-07 18:09:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3158016. Throughput: 0: 822.6. Samples: 788264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:09:10,069][00281] Avg episode reward: [(0, '9.159')]
[2024-11-07 18:09:15,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 3178496. Throughput: 0: 823.1. Samples: 794108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:09:15,073][00281] Avg episode reward: [(0, '10.285')]
[2024-11-07 18:09:15,079][03002] Saving new best policy, reward=10.285!
[2024-11-07 18:09:18,678][03016] Updated weights for policy 0, policy_version 780 (0.0025)
[2024-11-07 18:09:20,072][00281] Fps is (10 sec: 3684.6, 60 sec: 3413.1, 300 sec: 3304.5). Total num frames: 3194880. Throughput: 0: 869.7. Samples: 799732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:09:20,074][00281] Avg episode reward: [(0, '7.964')]
[2024-11-07 18:09:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3207168. Throughput: 0: 840.7. Samples: 801426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:09:25,073][00281] Avg episode reward: [(0, '8.455')]
[2024-11-07 18:09:30,067][00281] Fps is (10 sec: 3278.4, 60 sec: 3345.3, 300 sec: 3304.6). Total num frames: 3227648. Throughput: 0: 813.6. Samples: 806424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:09:30,074][00281] Avg episode reward: [(0, '10.130')]
[2024-11-07 18:09:31,559][03016] Updated weights for policy 0, policy_version 790 (0.0046)
[2024-11-07 18:09:35,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3318.5). Total num frames: 3248128. Throughput: 0: 858.5. Samples: 812486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:09:35,075][00281] Avg episode reward: [(0, '9.090')]
[2024-11-07 18:09:40,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3260416. Throughput: 0: 855.1. Samples: 814592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:09:40,071][00281] Avg episode reward: [(0, '8.139')]
[2024-11-07 18:09:44,518][03016] Updated weights for policy 0, policy_version 800 (0.0030)
[2024-11-07 18:09:45,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3276800. Throughput: 0: 820.0. Samples: 818788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:09:45,069][00281] Avg episode reward: [(0, '8.765')]
[2024-11-07 18:09:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3297280. Throughput: 0: 848.3. Samples: 824796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:09:50,069][00281] Avg episode reward: [(0, '8.068')]
[2024-11-07 18:09:55,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3309568. Throughput: 0: 872.5. Samples: 827526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:09:55,072][00281] Avg episode reward: [(0, '8.376')]
[2024-11-07 18:09:56,956][03016] Updated weights for policy 0, policy_version 810 (0.0014)
[2024-11-07 18:10:00,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 3325952. Throughput: 0: 817.8. Samples: 830910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:10:00,075][00281] Avg episode reward: [(0, '7.307')]
[2024-11-07 18:10:05,067][00281] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3346432. Throughput: 0: 828.0. Samples: 836990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:10:05,073][00281] Avg episode reward: [(0, '8.267')]
[2024-11-07 18:10:07,747][03016] Updated weights for policy 0, policy_version 820 (0.0036)
[2024-11-07 18:10:10,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3362816. Throughput: 0: 859.6. Samples: 840106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:10:10,072][00281] Avg episode reward: [(0, '8.461')]
[2024-11-07 18:10:15,068][00281] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 3375104. Throughput: 0: 842.5. Samples: 844336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:10:15,075][00281] Avg episode reward: [(0, '9.205')]
[2024-11-07 18:10:20,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.3, 300 sec: 3318.5). Total num frames: 3395584. Throughput: 0: 822.4. Samples: 849494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:10:20,072][00281] Avg episode reward: [(0, '8.223')]
[2024-11-07 18:10:20,084][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth...
[2024-11-07 18:10:20,233][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth
[2024-11-07 18:10:20,619][03016] Updated weights for policy 0, policy_version 830 (0.0039)
[2024-11-07 18:10:25,067][00281] Fps is (10 sec: 4096.4, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 3416064. Throughput: 0: 842.6. Samples: 852510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:10:25,068][00281] Avg episode reward: [(0, '9.189')]
[2024-11-07 18:10:30,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3428352. Throughput: 0: 859.2. Samples: 857452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:10:30,071][00281] Avg episode reward: [(0, '8.753')]
[2024-11-07 18:10:33,531][03016] Updated weights for policy 0, policy_version 840 (0.0015)
[2024-11-07 18:10:35,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3444736. Throughput: 0: 820.7. Samples: 861728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:10:35,071][00281] Avg episode reward: [(0, '8.998')]
[2024-11-07 18:10:40,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3465216. Throughput: 0: 828.7. Samples: 864816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:10:40,069][00281] Avg episode reward: [(0, '8.787')]
[2024-11-07 18:10:43,874][03016] Updated weights for policy 0, policy_version 850 (0.0023)
[2024-11-07 18:10:45,068][00281] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3332.3). Total num frames: 3481600. Throughput: 0: 886.3. Samples: 870794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:10:45,074][00281] Avg episode reward: [(0, '10.212')]
[2024-11-07 18:10:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3493888. Throughput: 0: 831.1. Samples: 874390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:10:50,069][00281] Avg episode reward: [(0, '9.444')]
[2024-11-07 18:10:55,067][00281] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3514368. Throughput: 0: 827.6. Samples: 877350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:10:55,073][00281] Avg episode reward: [(0, '8.586')]
[2024-11-07 18:10:56,454][03016] Updated weights for policy 0, policy_version 860 (0.0017)
[2024-11-07 18:11:00,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 3534848. Throughput: 0: 867.3. Samples: 883362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:11:00,073][00281] Avg episode reward: [(0, '8.344')]
[2024-11-07 18:11:05,067][00281] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 3304.6). Total num frames: 3547136. Throughput: 0: 845.2. Samples: 887528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:11:05,075][00281] Avg episode reward: [(0, '9.910')]
[2024-11-07 18:11:09,136][03016] Updated weights for policy 0, policy_version 870 (0.0024)
[2024-11-07 18:11:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3563520. Throughput: 0: 830.8. Samples: 889898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:11:10,075][00281] Avg episode reward: [(0, '9.638')]
[2024-11-07 18:11:15,067][00281] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3346.2). Total num frames: 3588096. Throughput: 0: 859.5. Samples: 896128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:11:15,069][00281] Avg episode reward: [(0, '9.854')]
[2024-11-07 18:11:20,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3600384. Throughput: 0: 873.7. Samples: 901046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:11:20,074][00281] Avg episode reward: [(0, '10.538')]
[2024-11-07 18:11:20,092][03002] Saving new best policy, reward=10.538!
[2024-11-07 18:11:20,848][03016] Updated weights for policy 0, policy_version 880 (0.0036)
[2024-11-07 18:11:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3616768. Throughput: 0: 842.5. Samples: 902728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:11:25,069][00281] Avg episode reward: [(0, '9.345')]
[2024-11-07 18:11:30,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 3637248. Throughput: 0: 842.1. Samples: 908688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:11:30,073][00281] Avg episode reward: [(0, '10.892')]
[2024-11-07 18:11:30,083][03002] Saving new best policy, reward=10.892!
[2024-11-07 18:11:31,713][03016] Updated weights for policy 0, policy_version 890 (0.0029)
[2024-11-07 18:11:35,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 3653632. Throughput: 0: 886.0. Samples: 914258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:11:35,070][00281] Avg episode reward: [(0, '9.840')]
[2024-11-07 18:11:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3665920. Throughput: 0: 859.5. Samples: 916026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:11:40,073][00281] Avg episode reward: [(0, '9.183')]
[2024-11-07 18:11:44,326][03016] Updated weights for policy 0, policy_version 900 (0.0053)
[2024-11-07 18:11:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3346.2). Total num frames: 3686400. Throughput: 0: 843.6. Samples: 921326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:11:45,072][00281] Avg episode reward: [(0, '10.487')]
[2024-11-07 18:11:50,074][00281] Fps is (10 sec: 4093.0, 60 sec: 3549.4, 300 sec: 3373.9). Total num frames: 3706880. Throughput: 0: 891.4. Samples: 927646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:11:50,076][00281] Avg episode reward: [(0, '11.213')]
[2024-11-07 18:11:50,093][03002] Saving new best policy, reward=11.213!
[2024-11-07 18:11:55,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 3719168. Throughput: 0: 882.2. Samples: 929596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:11:55,069][00281] Avg episode reward: [(0, '10.129')]
[2024-11-07 18:11:57,088][03016] Updated weights for policy 0, policy_version 910 (0.0028)
[2024-11-07 18:12:00,067][00281] Fps is (10 sec: 2869.3, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3735552. Throughput: 0: 839.8. Samples: 933920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:12:00,069][00281] Avg episode reward: [(0, '10.187')]
[2024-11-07 18:12:05,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3760128. Throughput: 0: 869.0. Samples: 940150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:12:05,069][00281] Avg episode reward: [(0, '10.387')]
[2024-11-07 18:12:06,986][03016] Updated weights for policy 0, policy_version 920 (0.0031)
[2024-11-07 18:12:10,068][00281] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3346.2). Total num frames: 3772416. Throughput: 0: 895.7. Samples: 943036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:12:10,071][00281] Avg episode reward: [(0, '9.954')]
[2024-11-07 18:12:15,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 3788800. Throughput: 0: 842.4. Samples: 946594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:12:15,070][00281] Avg episode reward: [(0, '9.879')]
[2024-11-07 18:12:19,744][03016] Updated weights for policy 0, policy_version 930 (0.0037)
[2024-11-07 18:12:20,067][00281] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 3809280. Throughput: 0: 855.5. Samples: 952756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:12:20,073][00281] Avg episode reward: [(0, '9.862')]
[2024-11-07 18:12:20,085][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000930_3809280.pth...
[2024-11-07 18:12:20,210][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth
[2024-11-07 18:12:25,069][00281] Fps is (10 sec: 3685.7, 60 sec: 3481.5, 300 sec: 3360.1). Total num frames: 3825664. Throughput: 0: 883.4. Samples: 955782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:12:25,071][00281] Avg episode reward: [(0, '10.307')]
[2024-11-07 18:12:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3837952. Throughput: 0: 857.7. Samples: 959924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-11-07 18:12:30,073][00281] Avg episode reward: [(0, '10.182')]
[2024-11-07 18:12:32,544][03016] Updated weights for policy 0, policy_version 940 (0.0028)
[2024-11-07 18:12:35,067][00281] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3858432. Throughput: 0: 834.7. Samples: 965202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:12:35,075][00281] Avg episode reward: [(0, '10.538')]
[2024-11-07 18:12:40,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3878912. Throughput: 0: 860.4. Samples: 968316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:12:40,073][00281] Avg episode reward: [(0, '10.507')]
[2024-11-07 18:12:43,300][03016] Updated weights for policy 0, policy_version 950 (0.0028)
[2024-11-07 18:12:45,068][00281] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3401.7). Total num frames: 3895296. Throughput: 0: 879.3. Samples: 973488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-11-07 18:12:45,077][00281] Avg episode reward: [(0, '10.706')]
[2024-11-07 18:12:50,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3413.8, 300 sec: 3387.9). Total num frames: 3911680. Throughput: 0: 842.3. Samples: 978054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:12:50,069][00281] Avg episode reward: [(0, '12.098')]
[2024-11-07 18:12:50,077][03002] Saving new best policy, reward=12.098!
[2024-11-07 18:12:54,966][03016] Updated weights for policy 0, policy_version 960 (0.0036)
[2024-11-07 18:12:55,067][00281] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 3932160. Throughput: 0: 847.9. Samples: 981188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:12:55,069][00281] Avg episode reward: [(0, '9.185')]
[2024-11-07 18:13:00,068][00281] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3401.7). Total num frames: 3944448. Throughput: 0: 890.1. Samples: 986652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-11-07 18:13:00,072][00281] Avg episode reward: [(0, '9.771')]
[2024-11-07 18:13:05,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 3956736. Throughput: 0: 832.0. Samples: 990194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-11-07 18:13:05,068][00281] Avg episode reward: [(0, '9.726')]
[2024-11-07 18:13:08,066][03016] Updated weights for policy 0, policy_version 970 (0.0029)
[2024-11-07 18:13:10,067][00281] Fps is (10 sec: 3687.0, 60 sec: 3481.7, 300 sec: 3415.6). Total num frames: 3981312. Throughput: 0: 835.8. Samples: 993392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-11-07 18:13:10,074][00281] Avg episode reward: [(0, '10.562')]
[2024-11-07 18:13:15,069][00281] Fps is (10 sec: 4095.1, 60 sec: 3481.5, 300 sec: 3415.6). Total num frames: 3997696. Throughput: 0: 880.8. Samples: 999564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-11-07 18:13:15,072][00281] Avg episode reward: [(0, '10.136')]
[2024-11-07 18:13:17,068][03002] Stopping Batcher_0...
[2024-11-07 18:13:17,069][03002] Loop batcher_evt_loop terminating...
[2024-11-07 18:13:17,069][00281] Component Batcher_0 stopped!
[2024-11-07 18:13:17,088][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-11-07 18:13:17,166][03016] Weights refcount: 2 0
[2024-11-07 18:13:17,180][03016] Stopping InferenceWorker_p0-w0...
[2024-11-07 18:13:17,181][03016] Loop inference_proc0-0_evt_loop terminating...
[2024-11-07 18:13:17,180][00281] Component InferenceWorker_p0-w0 stopped!
[2024-11-07 18:13:17,284][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth
[2024-11-07 18:13:17,305][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-11-07 18:13:17,555][00281] Component LearnerWorker_p0 stopped!
[2024-11-07 18:13:17,561][03002] Stopping LearnerWorker_p0...
[2024-11-07 18:13:17,561][03002] Loop learner_proc0_evt_loop terminating...
[2024-11-07 18:13:17,750][00281] Component RolloutWorker_w6 stopped!
[2024-11-07 18:13:17,757][03023] Stopping RolloutWorker_w6...
[2024-11-07 18:13:17,758][03023] Loop rollout_proc6_evt_loop terminating...
[2024-11-07 18:13:17,766][03017] Stopping RolloutWorker_w1...
[2024-11-07 18:13:17,768][03017] Loop rollout_proc1_evt_loop terminating...
[2024-11-07 18:13:17,766][00281] Component RolloutWorker_w1 stopped!
[2024-11-07 18:13:17,777][03021] Stopping RolloutWorker_w5...
[2024-11-07 18:13:17,777][03021] Loop rollout_proc5_evt_loop terminating...
[2024-11-07 18:13:17,777][00281] Component RolloutWorker_w5 stopped!
[2024-11-07 18:13:17,791][03020] Stopping RolloutWorker_w4...
[2024-11-07 18:13:17,792][03022] Stopping RolloutWorker_w7...
[2024-11-07 18:13:17,792][03020] Loop rollout_proc4_evt_loop terminating...
[2024-11-07 18:13:17,792][03022] Loop rollout_proc7_evt_loop terminating...
[2024-11-07 18:13:17,793][00281] Component RolloutWorker_w4 stopped!
[2024-11-07 18:13:17,801][00281] Component RolloutWorker_w7 stopped!
[2024-11-07 18:13:17,822][03019] Stopping RolloutWorker_w3...
[2024-11-07 18:13:17,821][00281] Component RolloutWorker_w3 stopped!
[2024-11-07 18:13:17,831][03018] Stopping RolloutWorker_w2...
[2024-11-07 18:13:17,832][03018] Loop rollout_proc2_evt_loop terminating...
[2024-11-07 18:13:17,825][03019] Loop rollout_proc3_evt_loop terminating...
[2024-11-07 18:13:17,834][00281] Component RolloutWorker_w2 stopped!
[2024-11-07 18:13:17,847][03015] Stopping RolloutWorker_w0...
[2024-11-07 18:13:17,852][03015] Loop rollout_proc0_evt_loop terminating...
[2024-11-07 18:13:17,847][00281] Component RolloutWorker_w0 stopped!
[2024-11-07 18:13:17,854][00281] Waiting for process learner_proc0 to stop...
[2024-11-07 18:13:19,935][00281] Waiting for process inference_proc0-0 to join...
[2024-11-07 18:13:19,943][00281] Waiting for process rollout_proc0 to join...
[2024-11-07 18:13:21,975][00281] Waiting for process rollout_proc1 to join...
[2024-11-07 18:13:21,979][00281] Waiting for process rollout_proc2 to join...
[2024-11-07 18:13:21,985][00281] Waiting for process rollout_proc3 to join...
[2024-11-07 18:13:21,987][00281] Waiting for process rollout_proc4 to join...
[2024-11-07 18:13:21,991][00281] Waiting for process rollout_proc5 to join...
[2024-11-07 18:13:21,995][00281] Waiting for process rollout_proc6 to join...
[2024-11-07 18:13:21,998][00281] Waiting for process rollout_proc7 to join...
[2024-11-07 18:13:22,003][00281] Batcher 0 profile tree view:
batching: 28.1903, releasing_batches: 0.0297
[2024-11-07 18:13:22,005][00281] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 244.7297
update_model: 9.6465
weight_update: 0.0032
one_step: 0.0153
handle_policy_step: 930.3081
deserialize: 16.9909, stack: 3.3934, obs_to_device_normalize: 140.6297, forward: 619.1030, send_messages: 33.0876
prepare_outputs: 85.2778
to_cpu: 51.9131
[2024-11-07 18:13:22,007][00281] Learner 0 profile tree view:
misc: 0.0052, prepare_batch: 14.1065
train: 84.2131
epoch_init: 0.0146, minibatch_init: 0.0065, losses_postprocess: 0.7474, kl_divergence: 1.6880, after_optimizer: 35.0201
calculate_losses: 32.2524
losses_init: 0.0047, forward_head: 1.3498, bptt_initial: 21.1490, tail: 2.1789, advantages_returns: 0.3523, losses: 4.8248
bptt: 2.0305
bptt_forward_core: 1.9211
update: 13.7950
clip: 0.8865
[2024-11-07 18:13:22,009][00281] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3816, enqueue_policy_requests: 55.7851, env_step: 946.9003, overhead: 14.5718, complete_rollouts: 7.4866
save_policy_outputs: 21.4157
split_output_tensors: 8.3712
[2024-11-07 18:13:22,010][00281] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3854, enqueue_policy_requests: 59.3264, env_step: 949.8981, overhead: 15.1676, complete_rollouts: 7.5613
save_policy_outputs: 21.7700
split_output_tensors: 9.0049
[2024-11-07 18:13:22,012][00281] Loop Runner_EvtLoop terminating...
[2024-11-07 18:13:22,013][00281] Runner profile tree view:
main_loop: 1259.6152
[2024-11-07 18:13:22,014][00281] Collected {0: 4005888}, FPS: 3180.2
[2024-11-07 18:13:22,054][00281] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-11-07 18:13:22,056][00281] Overriding arg 'num_workers' with value 1 passed from command line
[2024-11-07 18:13:22,058][00281] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-11-07 18:13:22,059][00281] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-11-07 18:13:22,061][00281] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-11-07 18:13:22,063][00281] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-11-07 18:13:22,064][00281] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-11-07 18:13:22,065][00281] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-11-07 18:13:22,068][00281] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-11-07 18:13:22,069][00281] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-11-07 18:13:22,070][00281] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-11-07 18:13:22,073][00281] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-11-07 18:13:22,074][00281] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-11-07 18:13:22,076][00281] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-11-07 18:13:22,077][00281] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-11-07 18:13:22,112][00281] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-11-07 18:13:22,117][00281] RunningMeanStd input shape: (3, 72, 128)
[2024-11-07 18:13:22,120][00281] RunningMeanStd input shape: (1,)
[2024-11-07 18:13:22,138][00281] ConvEncoder: input_channels=3
[2024-11-07 18:13:22,263][00281] Conv encoder output size: 512
[2024-11-07 18:13:22,264][00281] Policy head output size: 512
[2024-11-07 18:13:22,436][00281] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-11-07 18:13:23,262][00281] Num frames 100...
[2024-11-07 18:13:23,361][00281] Avg episode rewards: #0: 5.882, true rewards: #0: 5.882
[2024-11-07 18:13:23,363][00281] Avg episode reward: 5.882, avg true_objective: 5.882
[2024-11-07 18:13:23,468][00281] Num frames 200...
[2024-11-07 18:13:23,634][00281] Avg episode rewards: #0: 6.787, true rewards: #0: 6.787
[2024-11-07 18:13:23,636][00281] Avg episode reward: 6.787, avg true_objective: 6.787
[2024-11-07 18:13:23,680][00281] Num frames 300...
[2024-11-07 18:13:23,822][00281] Num frames 400...
[2024-11-07 18:13:23,922][00281] Avg episode rewards: #0: 7.140, true rewards: #0: 7.140
[2024-11-07 18:13:23,923][00281] Avg episode reward: 7.140, avg true_objective: 7.140
[2024-11-07 18:13:24,023][00281] Num frames 500...
[2024-11-07 18:13:24,111][00281] Avg episode rewards: #0: 6.399, true rewards: #0: 6.399
[2024-11-07 18:13:24,113][00281] Avg episode reward: 6.399, avg true_objective: 6.399
[2024-11-07 18:13:24,226][00281] Num frames 600...
[2024-11-07 18:13:24,298][00281] Avg episode rewards: #0: 5.915, true rewards: #0: 5.915
[2024-11-07 18:13:24,299][00281] Avg episode reward: 5.915, avg true_objective: 5.915
[2024-11-07 18:13:24,436][00281] Num frames 700...
[2024-11-07 18:13:24,589][00281] Num frames 800...
[2024-11-07 18:13:24,665][00281] Avg episode rewards: #0: 8.723, true rewards: #0: 8.723
[2024-11-07 18:13:24,667][00281] Avg episode reward: 8.723, avg true_objective: 8.723
[2024-11-07 18:13:24,787][00281] Num frames 900...
[2024-11-07 18:13:24,929][00281] Avg episode rewards: #0: 8.594, true rewards: #0: 8.594
[2024-11-07 18:13:24,930][00281] Avg episode reward: 8.594, avg true_objective: 8.594
[2024-11-07 18:13:24,989][00281] Num frames 1000...
[2024-11-07 18:13:25,163][00281] Avg episode rewards: #0: 8.454, true rewards: #0: 8.454
[2024-11-07 18:13:25,164][00281] Avg episode reward: 8.454, avg true_objective: 8.454
[2024-11-07 18:13:25,199][00281] Num frames 1100...
[2024-11-07 18:13:25,327][00281] Avg episode rewards: #0: 7.902, true rewards: #0: 7.902
[2024-11-07 18:13:25,328][00281] Avg episode reward: 7.902, avg true_objective: 7.902
[2024-11-07 18:13:25,401][00281] Num frames 1200...
[2024-11-07 18:13:25,506][00281] Avg episode rewards: #0: 7.485, true rewards: #0: 7.485
[2024-11-07 18:13:25,507][00281] Avg episode reward: 7.485, avg true_objective: 7.485
[2024-11-07 18:13:32,749][00281] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-11-07 18:13:32,878][00281] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-11-07 18:13:32,883][00281] Overriding arg 'num_workers' with value 1 passed from command line
[2024-11-07 18:13:32,885][00281] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-11-07 18:13:32,887][00281] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-11-07 18:13:32,888][00281] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-11-07 18:13:32,894][00281] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-11-07 18:13:32,895][00281] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-11-07 18:13:32,897][00281] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-11-07 18:13:32,898][00281] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-11-07 18:13:32,899][00281] Adding new argument 'hf_repository'='drap/rl_course_vizdoom' that is not in the saved config file!
[2024-11-07 18:13:32,900][00281] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-11-07 18:13:32,905][00281] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-11-07 18:13:32,906][00281] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-11-07 18:13:32,907][00281] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-11-07 18:13:32,908][00281] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-11-07 18:13:32,950][00281] RunningMeanStd input shape: (3, 72, 128)
[2024-11-07 18:13:32,953][00281] RunningMeanStd input shape: (1,)
[2024-11-07 18:13:32,970][00281] ConvEncoder: input_channels=3
[2024-11-07 18:13:33,027][00281] Conv encoder output size: 512
[2024-11-07 18:13:33,029][00281] Policy head output size: 512
[2024-11-07 18:13:33,057][00281] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-11-07 18:13:33,637][00281] Num frames 100...
[2024-11-07 18:13:33,783][00281] Num frames 200...
[2024-11-07 18:13:33,843][00281] Avg episode rewards: #0: 22.802, true rewards: #0: 22.802
[2024-11-07 18:13:33,844][00281] Avg episode reward: 22.802, avg true_objective: 22.802
[2024-11-07 18:13:34,021][00281] Avg episode rewards: #0: 12.983, true rewards: #0: 12.983
[2024-11-07 18:13:34,024][00281] Avg episode reward: 12.983, avg true_objective: 12.983
[2024-11-07 18:13:34,052][00281] Num frames 300...
[2024-11-07 18:13:34,228][00281] Num frames 400...
[2024-11-07 18:13:34,350][00281] Avg episode rewards: #0: 11.833, true rewards: #0: 11.833
[2024-11-07 18:13:34,352][00281] Avg episode reward: 11.833, avg true_objective: 11.833
[2024-11-07 18:13:34,443][00281] Num frames 500...
[2024-11-07 18:13:34,591][00281] Avg episode rewards: #0: 10.469, true rewards: #0: 10.469
[2024-11-07 18:13:34,595][00281] Avg episode reward: 10.469, avg true_objective: 10.469
[2024-11-07 18:13:34,644][00281] Num frames 600...
[2024-11-07 18:13:34,790][00281] Avg episode rewards: #0: 9.259, true rewards: #0: 9.259
[2024-11-07 18:13:34,791][00281] Avg episode reward: 9.259, avg true_objective: 9.259
[2024-11-07 18:13:34,843][00281] Num frames 700...
[2024-11-07 18:13:35,032][00281] Avg episode rewards: #0: 8.889, true rewards: #0: 8.889
[2024-11-07 18:13:35,033][00281] Avg episode reward: 8.889, avg true_objective: 8.889
[2024-11-07 18:13:35,056][00281] Num frames 800...
[2024-11-07 18:13:35,208][00281] Num frames 900...
[2024-11-07 18:13:35,334][00281] Avg episode rewards: #0: 8.872, true rewards: #0: 8.872
[2024-11-07 18:13:35,337][00281] Avg episode reward: 8.872, avg true_objective: 8.872
[2024-11-07 18:13:35,408][00281] Num frames 1000...
[2024-11-07 18:13:35,553][00281] Num frames 1100...
[2024-11-07 18:13:35,677][00281] Avg episode rewards: #0: 10.609, true rewards: #0: 10.609
[2024-11-07 18:13:35,678][00281] Avg episode reward: 10.609, avg true_objective: 10.609
[2024-11-07 18:13:35,763][00281] Num frames 1200...
[2024-11-07 18:13:35,907][00281] Num frames 1300...
[2024-11-07 18:13:36,033][00281] Avg episode rewards: #0: 11.967, true rewards: #0: 11.967
[2024-11-07 18:13:36,036][00281] Avg episode reward: 11.967, avg true_objective: 11.967
[2024-11-07 18:13:36,143][00281] Num frames 1400...
[2024-11-07 18:13:36,289][00281] Num frames 1500...
[2024-11-07 18:13:36,398][00281] Avg episode rewards: #0: 13.046, true rewards: #0: 13.046
[2024-11-07 18:13:36,401][00281] Avg episode reward: 13.046, avg true_objective: 13.046
[2024-11-07 18:13:43,874][00281] Replay video saved to /content/train_dir/default_experiment/replay.mp4!