| [2024-11-07 17:51:58,286][00281] Saving configuration to /content/train_dir/default_experiment/config.json... |
| [2024-11-07 17:51:58,289][00281] Rollout worker 0 uses device cpu |
| [2024-11-07 17:51:58,290][00281] Rollout worker 1 uses device cpu |
| [2024-11-07 17:51:58,292][00281] Rollout worker 2 uses device cpu |
| [2024-11-07 17:51:58,293][00281] Rollout worker 3 uses device cpu |
| [2024-11-07 17:51:58,294][00281] Rollout worker 4 uses device cpu |
| [2024-11-07 17:51:58,295][00281] Rollout worker 5 uses device cpu |
| [2024-11-07 17:51:58,296][00281] Rollout worker 6 uses device cpu |
| [2024-11-07 17:51:58,297][00281] Rollout worker 7 uses device cpu |
| [2024-11-07 17:51:58,456][00281] Using GPUs [0] for process 0 (actually maps to GPUs [0]) |
| [2024-11-07 17:51:58,458][00281] InferenceWorker_p0-w0: min num requests: 2 |
| [2024-11-07 17:51:58,492][00281] Starting all processes... |
| [2024-11-07 17:51:58,495][00281] Starting process learner_proc0 |
| [2024-11-07 17:51:58,540][00281] Starting all processes... |
| [2024-11-07 17:51:58,550][00281] Starting process inference_proc0-0 |
| [2024-11-07 17:51:58,550][00281] Starting process rollout_proc0 |
| [2024-11-07 17:51:58,552][00281] Starting process rollout_proc1 |
| [2024-11-07 17:51:58,553][00281] Starting process rollout_proc2 |
| [2024-11-07 17:51:58,553][00281] Starting process rollout_proc3 |
| [2024-11-07 17:51:58,553][00281] Starting process rollout_proc4 |
| [2024-11-07 17:51:58,553][00281] Starting process rollout_proc5 |
| [2024-11-07 17:51:58,553][00281] Starting process rollout_proc6 |
| [2024-11-07 17:51:58,553][00281] Starting process rollout_proc7 |
| [2024-11-07 17:52:06,192][00281] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 281], exiting... |
| [2024-11-07 17:52:06,198][00281] Runner profile tree view: |
| main_loop: 7.7048 |
| [2024-11-07 17:52:06,202][00281] Collected {}, FPS: 0.0 |
| [2024-11-07 17:52:19,998][00281] Environment doom_basic already registered, overwriting... |
| [2024-11-07 17:52:20,000][00281] Environment doom_two_colors_easy already registered, overwriting... |
| [2024-11-07 17:52:20,002][00281] Environment doom_two_colors_hard already registered, overwriting... |
| [2024-11-07 17:52:20,003][00281] Environment doom_dm already registered, overwriting... |
| [2024-11-07 17:52:20,005][00281] Environment doom_dwango5 already registered, overwriting... |
| [2024-11-07 17:52:20,006][00281] Environment doom_my_way_home_flat_actions already registered, overwriting... |
| [2024-11-07 17:52:20,008][00281] Environment doom_defend_the_center_flat_actions already registered, overwriting... |
| [2024-11-07 17:52:20,009][00281] Environment doom_my_way_home already registered, overwriting... |
| [2024-11-07 17:52:20,010][00281] Environment doom_deadly_corridor already registered, overwriting... |
| [2024-11-07 17:52:20,012][00281] Environment doom_defend_the_center already registered, overwriting... |
| [2024-11-07 17:52:20,013][00281] Environment doom_defend_the_line already registered, overwriting... |
| [2024-11-07 17:52:20,014][00281] Environment doom_health_gathering already registered, overwriting... |
| [2024-11-07 17:52:20,015][00281] Environment doom_health_gathering_supreme already registered, overwriting... |
| [2024-11-07 17:52:20,017][00281] Environment doom_battle already registered, overwriting... |
| [2024-11-07 17:52:20,018][00281] Environment doom_battle2 already registered, overwriting... |
| [2024-11-07 17:52:20,019][00281] Environment doom_duel_bots already registered, overwriting... |
| [2024-11-07 17:52:20,021][00281] Environment doom_deathmatch_bots already registered, overwriting... |
| [2024-11-07 17:52:20,022][00281] Environment doom_duel already registered, overwriting... |
| [2024-11-07 17:52:20,023][00281] Environment doom_deathmatch_full already registered, overwriting... |
| [2024-11-07 17:52:20,024][00281] Environment doom_benchmark already registered, overwriting... |
| [2024-11-07 17:52:20,026][00281] register_encoder_factory: <function make_vizdoom_encoder at 0x7cccac3fe290> |
| [2024-11-07 17:52:20,052][00281] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json |
| [2024-11-07 17:52:20,053][00281] Overriding arg 'env' with value 'doom_deadly_corridor' passed from command line |
| [2024-11-07 17:52:20,059][00281] Experiment dir /content/train_dir/default_experiment already exists! |
| [2024-11-07 17:52:20,060][00281] Resuming existing experiment from /content/train_dir/default_experiment... |
| [2024-11-07 17:52:20,062][00281] Weights and Biases integration disabled |
| [2024-11-07 17:52:20,066][00281] Environment var CUDA_VISIBLE_DEVICES is 0 |
|
|
| [2024-11-07 17:52:22,251][00281] Starting experiment with the following configuration: |
| help=False |
| algo=APPO |
| env=doom_deadly_corridor |
| experiment=default_experiment |
| train_dir=/content/train_dir |
| restart_behavior=resume |
| device=gpu |
| seed=None |
| num_policies=1 |
| async_rl=True |
| serial_mode=False |
| batched_sampling=False |
| num_batches_to_accumulate=2 |
| worker_num_splits=2 |
| policy_workers_per_policy=1 |
| max_policy_lag=1000 |
| num_workers=8 |
| num_envs_per_worker=4 |
| batch_size=1024 |
| num_batches_per_epoch=1 |
| num_epochs=1 |
| rollout=32 |
| recurrence=32 |
| shuffle_minibatches=False |
| gamma=0.99 |
| reward_scale=1.0 |
| reward_clip=1000.0 |
| value_bootstrap=False |
| normalize_returns=True |
| exploration_loss_coeff=0.001 |
| value_loss_coeff=0.5 |
| kl_loss_coeff=0.0 |
| exploration_loss=symmetric_kl |
| gae_lambda=0.95 |
| ppo_clip_ratio=0.1 |
| ppo_clip_value=0.2 |
| with_vtrace=False |
| vtrace_rho=1.0 |
| vtrace_c=1.0 |
| optimizer=adam |
| adam_eps=1e-06 |
| adam_beta1=0.9 |
| adam_beta2=0.999 |
| max_grad_norm=4.0 |
| learning_rate=0.0001 |
| lr_schedule=constant |
| lr_schedule_kl_threshold=0.008 |
| lr_adaptive_min=1e-06 |
| lr_adaptive_max=0.01 |
| obs_subtract_mean=0.0 |
| obs_scale=255.0 |
| normalize_input=True |
| normalize_input_keys=None |
| decorrelate_experience_max_seconds=0 |
| decorrelate_envs_on_one_worker=True |
| actor_worker_gpus=[] |
| set_workers_cpu_affinity=True |
| force_envs_single_thread=False |
| default_niceness=0 |
| log_to_file=True |
| experiment_summaries_interval=10 |
| flush_summaries_interval=30 |
| stats_avg=100 |
| summaries_use_frameskip=True |
| heartbeat_interval=20 |
| heartbeat_reporting_interval=600 |
| train_for_env_steps=4000000 |
| train_for_seconds=10000000000 |
| save_every_sec=120 |
| keep_checkpoints=2 |
| load_checkpoint_kind=latest |
| save_milestones_sec=-1 |
| save_best_every_sec=5 |
| save_best_metric=reward |
| save_best_after=100000 |
| benchmark=False |
| encoder_mlp_layers=[512, 512] |
| encoder_conv_architecture=convnet_simple |
| encoder_conv_mlp_layers=[512] |
| use_rnn=True |
| rnn_size=512 |
| rnn_type=gru |
| rnn_num_layers=1 |
| decoder_mlp_layers=[] |
| nonlinearity=elu |
| policy_initialization=orthogonal |
| policy_init_gain=1.0 |
| actor_critic_share_weights=True |
| adaptive_stddev=True |
| continuous_tanh_scale=0.0 |
| initial_stddev=1.0 |
| use_env_info_cache=False |
| env_gpu_actions=False |
| env_gpu_observations=True |
| env_frameskip=4 |
| env_framestack=1 |
| pixel_format=CHW |
| use_record_episode_statistics=False |
| with_wandb=False |
| wandb_user=None |
| wandb_project=sample_factory |
| wandb_group=None |
| wandb_job_type=SF |
| wandb_tags=[] |
| with_pbt=False |
| pbt_mix_policies_in_one_env=True |
| pbt_period_env_steps=5000000 |
| pbt_start_mutation=20000000 |
| pbt_replace_fraction=0.3 |
| pbt_mutation_rate=0.15 |
| pbt_replace_reward_gap=0.1 |
| pbt_replace_reward_gap_absolute=1e-06 |
| pbt_optimize_gamma=False |
| pbt_target_objective=true_objective |
| pbt_perturb_min=1.1 |
| pbt_perturb_max=1.5 |
| num_agents=-1 |
| num_humans=0 |
| num_bots=-1 |
| start_bot_difficulty=None |
| timelimit=None |
| res_w=128 |
| res_h=72 |
| wide_aspect_ratio=False |
| eval_env_frameskip=1 |
| fps=35 |
| command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 |
| cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} |
| git_hash=unknown |
| git_repo_name=not a git repository |
| [2024-11-07 17:52:22,253][00281] Saving configuration to /content/train_dir/default_experiment/config.json... |
| [2024-11-07 17:52:22,259][00281] Rollout worker 0 uses device cpu |
| [2024-11-07 17:52:22,260][00281] Rollout worker 1 uses device cpu |
| [2024-11-07 17:52:22,262][00281] Rollout worker 2 uses device cpu |
| [2024-11-07 17:52:22,263][00281] Rollout worker 3 uses device cpu |
| [2024-11-07 17:52:22,265][00281] Rollout worker 4 uses device cpu |
| [2024-11-07 17:52:22,266][00281] Rollout worker 5 uses device cpu |
| [2024-11-07 17:52:22,267][00281] Rollout worker 6 uses device cpu |
| [2024-11-07 17:52:22,268][00281] Rollout worker 7 uses device cpu |
| [2024-11-07 17:52:22,365][00281] Using GPUs [0] for process 0 (actually maps to GPUs [0]) |
| [2024-11-07 17:52:22,367][00281] InferenceWorker_p0-w0: min num requests: 2 |
| [2024-11-07 17:52:22,398][00281] Starting all processes... |
| [2024-11-07 17:52:22,400][00281] Starting process learner_proc0 |
| [2024-11-07 17:52:22,448][00281] Starting all processes... |
| [2024-11-07 17:52:22,456][00281] Starting process inference_proc0-0 |
| [2024-11-07 17:52:22,457][00281] Starting process rollout_proc0 |
| [2024-11-07 17:52:22,457][00281] Starting process rollout_proc1 |
| [2024-11-07 17:52:22,457][00281] Starting process rollout_proc2 |
| [2024-11-07 17:52:22,458][00281] Starting process rollout_proc3 |
| [2024-11-07 17:52:22,458][00281] Starting process rollout_proc4 |
| [2024-11-07 17:52:22,458][00281] Starting process rollout_proc5 |
| [2024-11-07 17:52:22,458][00281] Starting process rollout_proc6 |
| [2024-11-07 17:52:22,458][00281] Starting process rollout_proc7 |
| [2024-11-07 17:52:40,694][03002] Using GPUs [0] for process 0 (actually maps to GPUs [0]) |
| [2024-11-07 17:52:40,704][03002] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 |
| [2024-11-07 17:52:40,832][03002] Num visible devices: 1 |
| [2024-11-07 17:52:40,888][03002] Starting seed is not provided |
| [2024-11-07 17:52:40,889][03002] Using GPUs [0] for process 0 (actually maps to GPUs [0]) |
| [2024-11-07 17:52:40,889][03002] Initializing actor-critic model on device cuda:0 |
| [2024-11-07 17:52:40,890][03002] RunningMeanStd input shape: (3, 72, 128) |
| [2024-11-07 17:52:40,893][03002] RunningMeanStd input shape: (1,) |
| [2024-11-07 17:52:41,089][03002] ConvEncoder: input_channels=3 |
| [2024-11-07 17:52:41,213][03015] Worker 0 uses CPU cores [0] |
| [2024-11-07 17:52:41,274][03018] Worker 2 uses CPU cores [0] |
| [2024-11-07 17:52:41,656][03019] Worker 3 uses CPU cores [1] |
| [2024-11-07 17:52:41,677][03017] Worker 1 uses CPU cores [1] |
| [2024-11-07 17:52:41,760][03016] Using GPUs [0] for process 0 (actually maps to GPUs [0]) |
| [2024-11-07 17:52:41,762][03016] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 |
| [2024-11-07 17:52:41,773][03020] Worker 4 uses CPU cores [0] |
| [2024-11-07 17:52:41,775][03023] Worker 6 uses CPU cores [0] |
| [2024-11-07 17:52:41,814][03016] Num visible devices: 1 |
| [2024-11-07 17:52:41,860][03021] Worker 5 uses CPU cores [1] |
| [2024-11-07 17:52:42,007][03002] Conv encoder output size: 512 |
| [2024-11-07 17:52:42,008][03002] Policy head output size: 512 |
| [2024-11-07 17:52:42,035][03022] Worker 7 uses CPU cores [1] |
| [2024-11-07 17:52:42,085][03002] Created Actor Critic model with architecture: |
| [2024-11-07 17:52:42,085][03002] ActorCriticSharedWeights( |
| (obs_normalizer): ObservationNormalizer( |
| (running_mean_std): RunningMeanStdDictInPlace( |
| (running_mean_std): ModuleDict( |
| (obs): RunningMeanStdInPlace() |
| ) |
| ) |
| ) |
| (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) |
| (encoder): VizdoomEncoder( |
| (basic_encoder): ConvEncoder( |
| (enc): RecursiveScriptModule( |
| original_name=ConvEncoderImpl |
| (conv_head): RecursiveScriptModule( |
| original_name=Sequential |
| (0): RecursiveScriptModule(original_name=Conv2d) |
| (1): RecursiveScriptModule(original_name=ELU) |
| (2): RecursiveScriptModule(original_name=Conv2d) |
| (3): RecursiveScriptModule(original_name=ELU) |
| (4): RecursiveScriptModule(original_name=Conv2d) |
| (5): RecursiveScriptModule(original_name=ELU) |
| ) |
| (mlp_layers): RecursiveScriptModule( |
| original_name=Sequential |
| (0): RecursiveScriptModule(original_name=Linear) |
| (1): RecursiveScriptModule(original_name=ELU) |
| ) |
| ) |
| ) |
| ) |
| (core): ModelCoreRNN( |
| (core): GRU(512, 512) |
| ) |
| (decoder): MlpDecoder( |
| (mlp): Identity() |
| ) |
| (critic_linear): Linear(in_features=512, out_features=1, bias=True) |
| (action_parameterization): ActionParameterizationDefault( |
| (distribution_linear): Linear(in_features=512, out_features=11, bias=True) |
| ) |
| ) |
| [2024-11-07 17:52:42,357][00281] Heartbeat connected on Batcher_0 |
| [2024-11-07 17:52:42,366][00281] Heartbeat connected on InferenceWorker_p0-w0 |
| [2024-11-07 17:52:42,374][00281] Heartbeat connected on RolloutWorker_w0 |
| [2024-11-07 17:52:42,378][00281] Heartbeat connected on RolloutWorker_w1 |
| [2024-11-07 17:52:42,381][00281] Heartbeat connected on RolloutWorker_w2 |
| [2024-11-07 17:52:42,386][00281] Heartbeat connected on RolloutWorker_w3 |
| [2024-11-07 17:52:42,388][00281] Heartbeat connected on RolloutWorker_w4 |
| [2024-11-07 17:52:42,392][00281] Heartbeat connected on RolloutWorker_w5 |
| [2024-11-07 17:52:42,396][00281] Heartbeat connected on RolloutWorker_w6 |
| [2024-11-07 17:52:42,398][00281] Heartbeat connected on RolloutWorker_w7 |
| [2024-11-07 17:52:42,436][03002] Using optimizer <class 'torch.optim.adam.Adam'> |
| [2024-11-07 17:52:46,252][03002] No checkpoints found |
| [2024-11-07 17:52:46,252][03002] Did not load from checkpoint, starting from scratch! |
| [2024-11-07 17:52:46,253][03002] Initialized policy 0 weights for model version 0 |
| [2024-11-07 17:52:46,257][03002] LearnerWorker_p0 finished initialization! |
| [2024-11-07 17:52:46,258][00281] Heartbeat connected on LearnerWorker_p0 |
| [2024-11-07 17:52:46,260][03002] Using GPUs [0] for process 0 (actually maps to GPUs [0]) |
| [2024-11-07 17:52:46,357][03016] RunningMeanStd input shape: (3, 72, 128) |
| [2024-11-07 17:52:46,358][03016] RunningMeanStd input shape: (1,) |
| [2024-11-07 17:52:46,370][03016] ConvEncoder: input_channels=3 |
| [2024-11-07 17:52:46,474][03016] Conv encoder output size: 512 |
| [2024-11-07 17:52:46,474][03016] Policy head output size: 512 |
| [2024-11-07 17:52:46,527][00281] Inference worker 0-0 is ready! |
| [2024-11-07 17:52:46,529][00281] All inference workers are ready! Signal rollout workers to start! |
| [2024-11-07 17:52:46,737][03021] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:46,738][03019] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:46,741][03022] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:46,735][03017] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:46,756][03020] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:46,757][03018] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:46,758][03023] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:46,759][03015] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 17:52:47,739][03023] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:47,740][03018] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:48,114][03018] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:48,415][03017] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:48,419][03021] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:48,421][03019] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:48,425][03022] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:49,206][03015] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:49,222][03023] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:49,727][03023] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:49,853][03017] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:49,855][03022] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:49,857][03021] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:49,868][03019] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:50,067][00281] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) |
| [2024-11-07 17:52:51,078][03022] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:51,080][03017] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:51,336][03015] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:51,441][03018] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:51,454][03020] Decorrelating experience for 0 frames... |
| [2024-11-07 17:52:51,479][03023] Decorrelating experience for 96 frames... |
| [2024-11-07 17:52:52,290][03020] Decorrelating experience for 32 frames... |
| [2024-11-07 17:52:52,638][03021] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:52,785][03017] Decorrelating experience for 96 frames... |
| [2024-11-07 17:52:52,815][03022] Decorrelating experience for 96 frames... |
| [2024-11-07 17:52:53,501][03019] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:54,750][03015] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:54,902][03020] Decorrelating experience for 64 frames... |
| [2024-11-07 17:52:55,067][00281] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.2. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) |
| [2024-11-07 17:52:55,072][00281] Avg episode reward: [(0, '-1.014')] |
| [2024-11-07 17:52:56,068][03018] Decorrelating experience for 96 frames... |
| [2024-11-07 17:52:56,553][03019] Decorrelating experience for 96 frames... |
| [2024-11-07 17:52:57,774][03015] Decorrelating experience for 96 frames... |
| [2024-11-07 17:52:57,948][03020] Decorrelating experience for 96 frames... |
| [2024-11-07 17:52:59,777][03021] Decorrelating experience for 96 frames... |
| [2024-11-07 17:53:00,067][00281] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 127.2. Samples: 1272. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) |
| [2024-11-07 17:53:00,069][00281] Avg episode reward: [(0, '-0.723')] |
| [2024-11-07 17:53:02,021][03002] Signal inference workers to stop experience collection... |
| [2024-11-07 17:53:02,059][03016] InferenceWorker_p0-w0: stopping experience collection |
| [2024-11-07 17:53:03,980][03002] Signal inference workers to resume experience collection... |
| [2024-11-07 17:53:03,981][03016] InferenceWorker_p0-w0: resuming experience collection |
| [2024-11-07 17:53:05,067][00281] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 218.1. Samples: 3272. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) |
| [2024-11-07 17:53:05,073][00281] Avg episode reward: [(0, '-0.670')] |
| [2024-11-07 17:53:10,067][00281] Fps is (10 sec: 3276.7, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 314.1. Samples: 6282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) |
| [2024-11-07 17:53:10,069][00281] Avg episode reward: [(0, '-0.334')] |
| [2024-11-07 17:53:13,719][03016] Updated weights for policy 0, policy_version 10 (0.0148) |
| [2024-11-07 17:53:15,067][00281] Fps is (10 sec: 2867.3, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 417.4. Samples: 10436. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) |
| [2024-11-07 17:53:15,071][00281] Avg episode reward: [(0, '-0.157')] |
| [2024-11-07 17:53:20,067][00281] Fps is (10 sec: 2457.6, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 501.1. Samples: 15034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:53:20,075][00281] Avg episode reward: [(0, '0.110')] |
| [2024-11-07 17:53:25,067][00281] Fps is (10 sec: 3686.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 520.5. Samples: 18216. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) |
| [2024-11-07 17:53:25,075][00281] Avg episode reward: [(0, '0.458')] |
| [2024-11-07 17:53:25,109][03016] Updated weights for policy 0, policy_version 20 (0.0021) |
| [2024-11-07 17:53:30,068][00281] Fps is (10 sec: 3685.8, 60 sec: 2355.1, 300 sec: 2355.1). Total num frames: 94208. Throughput: 0: 597.5. Samples: 23900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:53:30,076][00281] Avg episode reward: [(0, '0.695')] |
| [2024-11-07 17:53:35,067][00281] Fps is (10 sec: 2867.1, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 610.9. Samples: 27490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:53:35,069][00281] Avg episode reward: [(0, '0.869')] |
| [2024-11-07 17:53:35,075][03002] Saving new best policy, reward=0.869! |
| [2024-11-07 17:53:38,873][03016] Updated weights for policy 0, policy_version 30 (0.0035) |
| [2024-11-07 17:53:40,067][00281] Fps is (10 sec: 3277.4, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 126976. Throughput: 0: 672.6. Samples: 30284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:53:40,071][00281] Avg episode reward: [(0, '1.651')] |
| [2024-11-07 17:53:40,087][03002] Saving new best policy, reward=1.651! |
| [2024-11-07 17:53:45,067][00281] Fps is (10 sec: 3686.5, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 768.6. Samples: 35858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:53:45,072][00281] Avg episode reward: [(0, '2.141')] |
| [2024-11-07 17:53:45,079][03002] Saving new best policy, reward=2.141! |
| [2024-11-07 17:53:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 803.2. Samples: 39416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:53:50,071][00281] Avg episode reward: [(0, '2.351')] |
| [2024-11-07 17:53:50,087][03002] Saving new best policy, reward=2.351! |
| [2024-11-07 17:53:52,419][03016] Updated weights for policy 0, policy_version 40 (0.0021) |
| [2024-11-07 17:53:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2646.7). Total num frames: 172032. Throughput: 0: 786.4. Samples: 41672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:53:55,074][00281] Avg episode reward: [(0, '2.798')] |
| [2024-11-07 17:53:55,077][03002] Saving new best policy, reward=2.798! |
| [2024-11-07 17:54:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2750.2). Total num frames: 192512. Throughput: 0: 824.1. Samples: 47520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:54:00,068][00281] Avg episode reward: [(0, '2.929')] |
| [2024-11-07 17:54:00,076][03002] Saving new best policy, reward=2.929! |
| [2024-11-07 17:54:04,288][03016] Updated weights for policy 0, policy_version 50 (0.0014) |
| [2024-11-07 17:54:05,069][00281] Fps is (10 sec: 3276.0, 60 sec: 3208.4, 300 sec: 2730.6). Total num frames: 204800. Throughput: 0: 818.7. Samples: 51878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:54:05,073][00281] Avg episode reward: [(0, '2.944')] |
| [2024-11-07 17:54:05,077][03002] Saving new best policy, reward=2.944! |
| [2024-11-07 17:54:10,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2713.6). Total num frames: 217088. Throughput: 0: 785.7. Samples: 53574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:54:10,069][00281] Avg episode reward: [(0, '3.147')] |
| [2024-11-07 17:54:10,108][03002] Saving new best policy, reward=3.147! |
| [2024-11-07 17:54:15,067][00281] Fps is (10 sec: 3277.5, 60 sec: 3276.8, 300 sec: 2794.9). Total num frames: 237568. Throughput: 0: 779.7. Samples: 58984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:54:15,072][00281] Avg episode reward: [(0, '3.356')] |
| [2024-11-07 17:54:15,075][03002] Saving new best policy, reward=3.356! |
| [2024-11-07 17:54:16,758][03016] Updated weights for policy 0, policy_version 60 (0.0028) |
| [2024-11-07 17:54:20,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2821.7). Total num frames: 253952. Throughput: 0: 812.8. Samples: 64068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:54:20,073][00281] Avg episode reward: [(0, '3.423')] |
| [2024-11-07 17:54:20,089][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000062_253952.pth... |
| [2024-11-07 17:54:20,294][03002] Saving new best policy, reward=3.423! |
| [2024-11-07 17:54:25,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2759.4). Total num frames: 262144. Throughput: 0: 783.1. Samples: 65522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 17:54:25,072][00281] Avg episode reward: [(0, '3.323')] |
| [2024-11-07 17:54:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.4, 300 sec: 2826.2). Total num frames: 282624. Throughput: 0: 767.0. Samples: 70374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 17:54:30,070][00281] Avg episode reward: [(0, '3.422')] |
| [2024-11-07 17:54:30,613][03016] Updated weights for policy 0, policy_version 70 (0.0037) |
| [2024-11-07 17:54:35,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3276.8, 300 sec: 2886.7). Total num frames: 303104. Throughput: 0: 814.8. Samples: 76084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 17:54:35,070][00281] Avg episode reward: [(0, '3.434')] |
| [2024-11-07 17:54:35,074][03002] Saving new best policy, reward=3.434! |
| [2024-11-07 17:54:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2830.0). Total num frames: 311296. Throughput: 0: 803.2. Samples: 77816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 17:54:40,073][00281] Avg episode reward: [(0, '3.544')] |
| [2024-11-07 17:54:40,085][03002] Saving new best policy, reward=3.544! |
| [2024-11-07 17:54:44,257][03016] Updated weights for policy 0, policy_version 80 (0.0040) |
| [2024-11-07 17:54:45,067][00281] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 2849.4). Total num frames: 327680. Throughput: 0: 764.0. Samples: 81902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) |
| [2024-11-07 17:54:45,071][00281] Avg episode reward: [(0, '3.574')] |
| [2024-11-07 17:54:45,075][03002] Saving new best policy, reward=3.574! |
| [2024-11-07 17:54:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2901.3). Total num frames: 348160. Throughput: 0: 800.2. Samples: 87886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:54:50,071][00281] Avg episode reward: [(0, '3.639')] |
| [2024-11-07 17:54:50,081][03002] Saving new best policy, reward=3.639! |
| [2024-11-07 17:54:55,072][00281] Fps is (10 sec: 3684.5, 60 sec: 3208.3, 300 sec: 2916.2). Total num frames: 364544. Throughput: 0: 819.2. Samples: 90444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 17:54:55,079][00281] Avg episode reward: [(0, '3.667')] |
| [2024-11-07 17:54:55,084][03002] Saving new best policy, reward=3.667! |
| [2024-11-07 17:54:56,637][03016] Updated weights for policy 0, policy_version 90 (0.0024) |
| [2024-11-07 17:55:00,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2867.2). Total num frames: 372736. Throughput: 0: 775.5. Samples: 93882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 17:55:00,072][00281] Avg episode reward: [(0, '3.784')] |
| [2024-11-07 17:55:00,158][03002] Saving new best policy, reward=3.784! |
| [2024-11-07 17:55:05,067][00281] Fps is (10 sec: 2049.1, 60 sec: 3003.8, 300 sec: 2852.0). Total num frames: 385024. Throughput: 0: 734.7. Samples: 97128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:55:05,069][00281] Avg episode reward: [(0, '3.686')] |
| [2024-11-07 17:55:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2867.2). Total num frames: 401408. Throughput: 0: 759.6. Samples: 99704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:10,070][00281] Avg episode reward: [(0, '3.966')] |
| [2024-11-07 17:55:10,082][03002] Saving new best policy, reward=3.966! |
| [2024-11-07 17:55:12,587][03016] Updated weights for policy 0, policy_version 100 (0.0041) |
| [2024-11-07 17:55:15,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2853.1). Total num frames: 413696. Throughput: 0: 734.1. Samples: 103410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 17:55:15,069][00281] Avg episode reward: [(0, '3.618')] |
| [2024-11-07 17:55:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2867.2). Total num frames: 430080. Throughput: 0: 719.0. Samples: 108440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:20,073][00281] Avg episode reward: [(0, '4.124')] |
| [2024-11-07 17:55:20,089][03002] Saving new best policy, reward=4.124! |
| [2024-11-07 17:55:24,449][03016] Updated weights for policy 0, policy_version 110 (0.0020) |
| [2024-11-07 17:55:25,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2906.8). Total num frames: 450560. Throughput: 0: 742.1. Samples: 111210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:25,071][00281] Avg episode reward: [(0, '4.115')] |
| [2024-11-07 17:55:30,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3003.7, 300 sec: 2892.8). Total num frames: 462848. Throughput: 0: 753.1. Samples: 115790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:30,069][00281] Avg episode reward: [(0, '3.860')] |
| [2024-11-07 17:55:35,067][00281] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2904.4). Total num frames: 479232. Throughput: 0: 715.2. Samples: 120072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:35,069][00281] Avg episode reward: [(0, '3.682')] |
| [2024-11-07 17:55:37,807][03016] Updated weights for policy 0, policy_version 120 (0.0028) |
| [2024-11-07 17:55:40,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2939.5). Total num frames: 499712. Throughput: 0: 724.3. Samples: 123032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:40,074][00281] Avg episode reward: [(0, '3.811')] |
| [2024-11-07 17:55:45,068][00281] Fps is (10 sec: 3276.2, 60 sec: 3071.9, 300 sec: 2925.7). Total num frames: 512000. Throughput: 0: 768.6. Samples: 128472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:45,071][00281] Avg episode reward: [(0, '3.994')] |
| [2024-11-07 17:55:50,067][00281] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2912.7). Total num frames: 524288. Throughput: 0: 774.8. Samples: 131994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:55:50,074][00281] Avg episode reward: [(0, '4.055')] |
| [2024-11-07 17:55:51,281][03016] Updated weights for policy 0, policy_version 130 (0.0019) |
| [2024-11-07 17:55:55,067][00281] Fps is (10 sec: 3277.4, 60 sec: 3004.0, 300 sec: 2944.7). Total num frames: 544768. Throughput: 0: 784.4. Samples: 135000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 17:55:55,069][00281] Avg episode reward: [(0, '4.020')] |
| [2024-11-07 17:56:00,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2975.0). Total num frames: 565248. Throughput: 0: 833.3. Samples: 140908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:56:00,072][00281] Avg episode reward: [(0, '3.830')] |
| [2024-11-07 17:56:03,132][03016] Updated weights for policy 0, policy_version 140 (0.0025) |
| [2024-11-07 17:56:05,075][00281] Fps is (10 sec: 3274.1, 60 sec: 3208.1, 300 sec: 2961.6). Total num frames: 577536. Throughput: 0: 805.8. Samples: 144706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:56:05,080][00281] Avg episode reward: [(0, '3.341')] |
| [2024-11-07 17:56:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2969.6). Total num frames: 593920. Throughput: 0: 794.4. Samples: 146956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:56:10,074][00281] Avg episode reward: [(0, '3.895')] |
| [2024-11-07 17:56:14,944][03016] Updated weights for policy 0, policy_version 150 (0.0028) |
| [2024-11-07 17:56:15,067][00281] Fps is (10 sec: 3689.4, 60 sec: 3345.1, 300 sec: 2997.1). Total num frames: 614400. Throughput: 0: 824.1. Samples: 152876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:56:15,074][00281] Avg episode reward: [(0, '3.932')] |
| [2024-11-07 17:56:20,075][00281] Fps is (10 sec: 3274.0, 60 sec: 3276.3, 300 sec: 2984.1). Total num frames: 626688. Throughput: 0: 828.3. Samples: 157352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:56:20,082][00281] Avg episode reward: [(0, '3.808')] |
| [2024-11-07 17:56:20,092][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth... |
| [2024-11-07 17:56:25,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 2972.0). Total num frames: 638976. Throughput: 0: 798.4. Samples: 158960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:56:25,074][00281] Avg episode reward: [(0, '3.827')] |
| [2024-11-07 17:56:28,261][03016] Updated weights for policy 0, policy_version 160 (0.0033) |
| [2024-11-07 17:56:30,067][00281] Fps is (10 sec: 3279.6, 60 sec: 3276.8, 300 sec: 2997.5). Total num frames: 659456. Throughput: 0: 802.8. Samples: 164598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:56:30,068][00281] Avg episode reward: [(0, '4.118')] |
| [2024-11-07 17:56:35,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3003.7). Total num frames: 675840. Throughput: 0: 845.8. Samples: 170054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:56:35,073][00281] Avg episode reward: [(0, '4.413')] |
| [2024-11-07 17:56:35,078][03002] Saving new best policy, reward=4.413! |
| [2024-11-07 17:56:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2991.9). Total num frames: 688128. Throughput: 0: 814.7. Samples: 171662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:56:40,069][00281] Avg episode reward: [(0, '4.091')] |
| [2024-11-07 17:56:41,734][03016] Updated weights for policy 0, policy_version 170 (0.0032) |
| [2024-11-07 17:56:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3015.4). Total num frames: 708608. Throughput: 0: 792.0. Samples: 176548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:56:45,075][00281] Avg episode reward: [(0, '4.218')] |
| [2024-11-07 17:56:50,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3037.9). Total num frames: 729088. Throughput: 0: 836.0. Samples: 182318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:56:50,069][00281] Avg episode reward: [(0, '3.906')] |
| [2024-11-07 17:56:53,350][03016] Updated weights for policy 0, policy_version 180 (0.0031) |
| [2024-11-07 17:56:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3009.3). Total num frames: 737280. Throughput: 0: 831.1. Samples: 184354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:56:55,077][00281] Avg episode reward: [(0, '4.618')] |
| [2024-11-07 17:56:55,085][03002] Saving new best policy, reward=4.618! |
| [2024-11-07 17:57:00,067][00281] Fps is (10 sec: 2457.5, 60 sec: 3140.3, 300 sec: 3014.7). Total num frames: 753664. Throughput: 0: 787.1. Samples: 188296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:57:00,070][00281] Avg episode reward: [(0, '4.551')] |
| [2024-11-07 17:57:05,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3277.2, 300 sec: 3035.9). Total num frames: 774144. Throughput: 0: 818.7. Samples: 194188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:57:05,072][00281] Avg episode reward: [(0, '4.727')] |
| [2024-11-07 17:57:05,077][03002] Saving new best policy, reward=4.727! |
| [2024-11-07 17:57:05,602][03016] Updated weights for policy 0, policy_version 190 (0.0014) |
| [2024-11-07 17:57:10,067][00281] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3040.5). Total num frames: 790528. Throughput: 0: 843.2. Samples: 196904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:57:10,069][00281] Avg episode reward: [(0, '5.531')] |
| [2024-11-07 17:57:10,085][03002] Saving new best policy, reward=5.531! |
| [2024-11-07 17:57:15,070][00281] Fps is (10 sec: 2866.3, 60 sec: 3140.1, 300 sec: 3029.5). Total num frames: 802816. Throughput: 0: 792.3. Samples: 200256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:57:15,078][00281] Avg episode reward: [(0, '4.646')] |
| [2024-11-07 17:57:18,877][03016] Updated weights for policy 0, policy_version 200 (0.0024) |
| [2024-11-07 17:57:20,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3277.3, 300 sec: 3049.2). Total num frames: 823296. Throughput: 0: 799.5. Samples: 206032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:57:20,069][00281] Avg episode reward: [(0, '4.570')] |
| [2024-11-07 17:57:25,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3345.1, 300 sec: 3053.4). Total num frames: 839680. Throughput: 0: 831.0. Samples: 209058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:57:25,075][00281] Avg episode reward: [(0, '4.460')] |
| [2024-11-07 17:57:30,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3042.7). Total num frames: 851968. Throughput: 0: 813.6. Samples: 213162. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 17:57:30,071][00281] Avg episode reward: [(0, '4.797')] |
| [2024-11-07 17:57:31,955][03016] Updated weights for policy 0, policy_version 210 (0.0025) |
| [2024-11-07 17:57:35,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3061.2). Total num frames: 872448. Throughput: 0: 798.0. Samples: 218226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:57:35,069][00281] Avg episode reward: [(0, '4.835')] |
| [2024-11-07 17:57:40,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3079.1). Total num frames: 892928. Throughput: 0: 820.5. Samples: 221278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:57:40,073][00281] Avg episode reward: [(0, '4.581')] |
| [2024-11-07 17:57:42,943][03016] Updated weights for policy 0, policy_version 220 (0.0016) |
| [2024-11-07 17:57:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3068.5). Total num frames: 905216. Throughput: 0: 841.2. Samples: 226150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:57:45,068][00281] Avg episode reward: [(0, '4.272')] |
| [2024-11-07 17:57:50,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 917504. Throughput: 0: 798.3. Samples: 230110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:57:50,070][00281] Avg episode reward: [(0, '5.067')] |
| [2024-11-07 17:57:55,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3179.6). Total num frames: 937984. Throughput: 0: 803.3. Samples: 233054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:57:55,070][00281] Avg episode reward: [(0, '4.756')] |
| [2024-11-07 17:57:55,649][03016] Updated weights for policy 0, policy_version 230 (0.0015) |
| [2024-11-07 17:58:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 954368. Throughput: 0: 854.9. Samples: 238724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:58:00,073][00281] Avg episode reward: [(0, '5.091')] |
| [2024-11-07 17:58:05,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 966656. Throughput: 0: 803.2. Samples: 242176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 17:58:05,075][00281] Avg episode reward: [(0, '5.093')] |
| [2024-11-07 17:58:08,693][03016] Updated weights for policy 0, policy_version 240 (0.0018) |
| [2024-11-07 17:58:10,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 987136. Throughput: 0: 800.5. Samples: 245080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 17:58:10,074][00281] Avg episode reward: [(0, '5.170')] |
| [2024-11-07 17:58:15,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3413.5, 300 sec: 3221.3). Total num frames: 1007616. Throughput: 0: 841.7. Samples: 251038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:58:15,068][00281] Avg episode reward: [(0, '5.191')] |
| [2024-11-07 17:58:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1015808. Throughput: 0: 815.4. Samples: 254920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:58:20,072][00281] Avg episode reward: [(0, '5.476')] |
| [2024-11-07 17:58:20,089][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000248_1015808.pth... |
| [2024-11-07 17:58:20,262][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000062_253952.pth |
| [2024-11-07 17:58:22,233][03016] Updated weights for policy 0, policy_version 250 (0.0030) |
| [2024-11-07 17:58:25,067][00281] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1032192. Throughput: 0: 791.1. Samples: 256878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:58:25,073][00281] Avg episode reward: [(0, '5.026')] |
| [2024-11-07 17:58:30,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1052672. Throughput: 0: 817.2. Samples: 262924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 17:58:30,073][00281] Avg episode reward: [(0, '5.508')] |
| [2024-11-07 17:58:32,841][03016] Updated weights for policy 0, policy_version 260 (0.0025) |
| [2024-11-07 17:58:35,067][00281] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 1069056. Throughput: 0: 835.4. Samples: 267702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:58:35,070][00281] Avg episode reward: [(0, '4.455')] |
| [2024-11-07 17:58:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3179.6). Total num frames: 1081344. Throughput: 0: 806.0. Samples: 269326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:58:40,070][00281] Avg episode reward: [(0, '4.395')] |
| [2024-11-07 17:58:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1101824. Throughput: 0: 801.9. Samples: 274810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:58:45,069][00281] Avg episode reward: [(0, '4.989')] |
| [2024-11-07 17:58:45,787][03016] Updated weights for policy 0, policy_version 270 (0.0024) |
| [2024-11-07 17:58:50,073][00281] Fps is (10 sec: 3684.1, 60 sec: 3344.7, 300 sec: 3207.3). Total num frames: 1118208. Throughput: 0: 848.7. Samples: 280374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:58:50,077][00281] Avg episode reward: [(0, '5.600')] |
| [2024-11-07 17:58:50,102][03002] Saving new best policy, reward=5.600! |
| [2024-11-07 17:58:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1130496. Throughput: 0: 822.0. Samples: 282072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:58:55,069][00281] Avg episode reward: [(0, '5.806')] |
| [2024-11-07 17:58:55,073][03002] Saving new best policy, reward=5.806! |
| [2024-11-07 17:58:59,074][03016] Updated weights for policy 0, policy_version 280 (0.0022) |
| [2024-11-07 17:59:00,070][00281] Fps is (10 sec: 2868.1, 60 sec: 3208.4, 300 sec: 3193.5). Total num frames: 1146880. Throughput: 0: 796.0. Samples: 286862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:59:00,079][00281] Avg episode reward: [(0, '5.671')] |
| [2024-11-07 17:59:05,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1167360. Throughput: 0: 844.8. Samples: 292936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:59:05,068][00281] Avg episode reward: [(0, '5.263')] |
| [2024-11-07 17:59:10,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1183744. Throughput: 0: 848.7. Samples: 295070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:59:10,068][00281] Avg episode reward: [(0, '5.343')] |
| [2024-11-07 17:59:11,602][03016] Updated weights for policy 0, policy_version 290 (0.0037) |
| [2024-11-07 17:59:15,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1200128. Throughput: 0: 804.9. Samples: 299146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:59:15,072][00281] Avg episode reward: [(0, '6.132')] |
| [2024-11-07 17:59:15,075][03002] Saving new best policy, reward=6.132! |
| [2024-11-07 17:59:20,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 1216512. Throughput: 0: 829.6. Samples: 305036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 17:59:20,072][00281] Avg episode reward: [(0, '5.342')] |
| [2024-11-07 17:59:22,111][03016] Updated weights for policy 0, policy_version 300 (0.0023) |
| [2024-11-07 17:59:25,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1232896. Throughput: 0: 859.0. Samples: 307980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 17:59:25,075][00281] Avg episode reward: [(0, '5.704')] |
| [2024-11-07 17:59:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 1245184. Throughput: 0: 814.8. Samples: 311474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:59:30,073][00281] Avg episode reward: [(0, '5.726')] |
| [2024-11-07 17:59:35,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1265664. Throughput: 0: 813.0. Samples: 316952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 17:59:35,069][00281] Avg episode reward: [(0, '5.294')] |
| [2024-11-07 17:59:35,577][03016] Updated weights for policy 0, policy_version 310 (0.0022) |
| [2024-11-07 17:59:40,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 1286144. Throughput: 0: 842.2. Samples: 319970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:59:40,071][00281] Avg episode reward: [(0, '5.036')] |
| [2024-11-07 17:59:45,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1294336. Throughput: 0: 827.8. Samples: 324110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:59:45,072][00281] Avg episode reward: [(0, '5.421')] |
| [2024-11-07 17:59:48,834][03016] Updated weights for policy 0, policy_version 320 (0.0022) |
| [2024-11-07 17:59:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3277.1, 300 sec: 3221.3). Total num frames: 1314816. Throughput: 0: 798.4. Samples: 328864. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 17:59:50,073][00281] Avg episode reward: [(0, '4.972')] |
| [2024-11-07 17:59:55,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 1335296. Throughput: 0: 818.4. Samples: 331896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 17:59:55,069][00281] Avg episode reward: [(0, '5.902')] |
| [2024-11-07 18:00:00,067][00281] Fps is (10 sec: 3276.7, 60 sec: 3345.2, 300 sec: 3262.9). Total num frames: 1347584. Throughput: 0: 839.2. Samples: 336912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:00:00,069][00281] Avg episode reward: [(0, '5.596')] |
| [2024-11-07 18:00:01,260][03016] Updated weights for policy 0, policy_version 330 (0.0021) |
| [2024-11-07 18:00:05,067][00281] Fps is (10 sec: 2867.0, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1363968. Throughput: 0: 800.7. Samples: 341066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:00:05,070][00281] Avg episode reward: [(0, '5.846')] |
| [2024-11-07 18:00:10,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1384448. Throughput: 0: 803.5. Samples: 344138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:00:10,069][00281] Avg episode reward: [(0, '6.473')] |
| [2024-11-07 18:00:10,077][03002] Saving new best policy, reward=6.473! |
| [2024-11-07 18:00:12,262][03016] Updated weights for policy 0, policy_version 340 (0.0014) |
| [2024-11-07 18:00:15,067][00281] Fps is (10 sec: 3686.7, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1400832. Throughput: 0: 853.1. Samples: 349862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:00:15,069][00281] Avg episode reward: [(0, '5.302')] |
| [2024-11-07 18:00:20,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1409024. Throughput: 0: 809.5. Samples: 353378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:00:20,069][00281] Avg episode reward: [(0, '6.516')] |
| [2024-11-07 18:00:20,148][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_1413120.pth... |
| [2024-11-07 18:00:20,288][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_626688.pth |
| [2024-11-07 18:00:20,303][03002] Saving new best policy, reward=6.516! |
| [2024-11-07 18:00:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1429504. Throughput: 0: 798.6. Samples: 355908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:00:25,076][00281] Avg episode reward: [(0, '6.422')] |
| [2024-11-07 18:00:25,549][03016] Updated weights for policy 0, policy_version 350 (0.0027) |
| [2024-11-07 18:00:30,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1449984. Throughput: 0: 842.4. Samples: 362018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:00:30,069][00281] Avg episode reward: [(0, '6.034')] |
| [2024-11-07 18:00:35,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1462272. Throughput: 0: 833.6. Samples: 366376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:00:35,072][00281] Avg episode reward: [(0, '5.829')] |
| [2024-11-07 18:00:38,566][03016] Updated weights for policy 0, policy_version 360 (0.0050) |
| [2024-11-07 18:00:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 1478656. Throughput: 0: 808.4. Samples: 368276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:00:40,072][00281] Avg episode reward: [(0, '6.529')] |
| [2024-11-07 18:00:40,084][03002] Saving new best policy, reward=6.529! |
| [2024-11-07 18:00:45,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1499136. Throughput: 0: 828.3. Samples: 374186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:00:45,074][00281] Avg episode reward: [(0, '6.295')] |
| [2024-11-07 18:00:49,816][03016] Updated weights for policy 0, policy_version 370 (0.0027) |
| [2024-11-07 18:00:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1515520. Throughput: 0: 847.1. Samples: 379186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:00:50,073][00281] Avg episode reward: [(0, '6.399')] |
| [2024-11-07 18:00:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1527808. Throughput: 0: 816.7. Samples: 380890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 18:00:55,074][00281] Avg episode reward: [(0, '6.051')] |
| [2024-11-07 18:01:00,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.8). Total num frames: 1548288. Throughput: 0: 807.6. Samples: 386206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:01:00,072][00281] Avg episode reward: [(0, '6.330')] |
| [2024-11-07 18:01:02,055][03016] Updated weights for policy 0, policy_version 380 (0.0027) |
| [2024-11-07 18:01:05,070][00281] Fps is (10 sec: 3685.2, 60 sec: 3344.9, 300 sec: 3290.7). Total num frames: 1564672. Throughput: 0: 859.9. Samples: 392074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:01:05,072][00281] Avg episode reward: [(0, '6.786')] |
| [2024-11-07 18:01:05,080][03002] Saving new best policy, reward=6.786! |
| [2024-11-07 18:01:10,070][00281] Fps is (10 sec: 2866.4, 60 sec: 3208.4, 300 sec: 3262.9). Total num frames: 1576960. Throughput: 0: 839.2. Samples: 393676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:01:10,072][00281] Avg episode reward: [(0, '6.082')] |
| [2024-11-07 18:01:15,067][00281] Fps is (10 sec: 2868.0, 60 sec: 3208.5, 300 sec: 3276.9). Total num frames: 1593344. Throughput: 0: 802.9. Samples: 398150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:01:15,072][00281] Avg episode reward: [(0, '7.565')] |
| [2024-11-07 18:01:15,077][03002] Saving new best policy, reward=7.565! |
| [2024-11-07 18:01:15,369][03016] Updated weights for policy 0, policy_version 390 (0.0027) |
| [2024-11-07 18:01:20,067][00281] Fps is (10 sec: 3687.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1613824. Throughput: 0: 836.1. Samples: 404002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:01:20,074][00281] Avg episode reward: [(0, '5.206')] |
| [2024-11-07 18:01:25,069][00281] Fps is (10 sec: 3276.3, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 1626112. Throughput: 0: 840.8. Samples: 406114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:01:25,074][00281] Avg episode reward: [(0, '6.255')] |
| [2024-11-07 18:01:30,070][00281] Fps is (10 sec: 2047.3, 60 sec: 3071.8, 300 sec: 3249.0). Total num frames: 1634304. Throughput: 0: 773.0. Samples: 408974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 18:01:30,075][00281] Avg episode reward: [(0, '6.501')] |
| [2024-11-07 18:01:31,002][03016] Updated weights for policy 0, policy_version 400 (0.0055) |
| [2024-11-07 18:01:35,067][00281] Fps is (10 sec: 2458.0, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 1650688. Throughput: 0: 757.9. Samples: 413290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:01:35,074][00281] Avg episode reward: [(0, '6.530')] |
| [2024-11-07 18:01:40,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1671168. Throughput: 0: 786.5. Samples: 416284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:01:40,073][00281] Avg episode reward: [(0, '6.444')] |
| [2024-11-07 18:01:41,782][03016] Updated weights for policy 0, policy_version 410 (0.0014) |
| [2024-11-07 18:01:45,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 3235.1). Total num frames: 1683456. Throughput: 0: 781.2. Samples: 421362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:01:45,073][00281] Avg episode reward: [(0, '6.499')] |
| [2024-11-07 18:01:50,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3072.0, 300 sec: 3262.9). Total num frames: 1699840. Throughput: 0: 739.6. Samples: 425354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:01:50,069][00281] Avg episode reward: [(0, '6.315')] |
| [2024-11-07 18:01:54,943][03016] Updated weights for policy 0, policy_version 420 (0.0022) |
| [2024-11-07 18:01:55,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 1720320. Throughput: 0: 771.2. Samples: 428380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:01:55,074][00281] Avg episode reward: [(0, '5.734')] |
| [2024-11-07 18:02:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3140.2, 300 sec: 3262.9). Total num frames: 1736704. Throughput: 0: 800.4. Samples: 434168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:02:00,075][00281] Avg episode reward: [(0, '6.634')] |
| [2024-11-07 18:02:05,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.2, 300 sec: 3249.0). Total num frames: 1748992. Throughput: 0: 748.9. Samples: 437704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:02:05,069][00281] Avg episode reward: [(0, '5.629')] |
| [2024-11-07 18:02:07,812][03016] Updated weights for policy 0, policy_version 430 (0.0038) |
| [2024-11-07 18:02:10,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3208.7, 300 sec: 3276.8). Total num frames: 1769472. Throughput: 0: 767.5. Samples: 440652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:02:10,074][00281] Avg episode reward: [(0, '5.546')] |
| [2024-11-07 18:02:15,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1789952. Throughput: 0: 842.5. Samples: 446886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:02:15,073][00281] Avg episode reward: [(0, '6.760')] |
| [2024-11-07 18:02:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 1798144. Throughput: 0: 835.7. Samples: 450898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:02:20,070][00281] Avg episode reward: [(0, '6.148')] |
| [2024-11-07 18:02:20,091][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth... |
| [2024-11-07 18:02:20,103][03016] Updated weights for policy 0, policy_version 440 (0.0051) |
| [2024-11-07 18:02:20,299][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000248_1015808.pth |
| [2024-11-07 18:02:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3276.8). Total num frames: 1818624. Throughput: 0: 815.4. Samples: 452976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:02:25,069][00281] Avg episode reward: [(0, '5.750')] |
| [2024-11-07 18:02:30,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3276.8). Total num frames: 1839104. Throughput: 0: 838.4. Samples: 459088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:02:30,069][00281] Avg episode reward: [(0, '6.709')] |
| [2024-11-07 18:02:31,011][03016] Updated weights for policy 0, policy_version 450 (0.0023) |
| [2024-11-07 18:02:35,068][00281] Fps is (10 sec: 3276.5, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 1851392. Throughput: 0: 858.0. Samples: 463966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:02:35,070][00281] Avg episode reward: [(0, '6.694')] |
| [2024-11-07 18:02:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1867776. Throughput: 0: 829.7. Samples: 465716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:02:40,073][00281] Avg episode reward: [(0, '6.557')] |
| [2024-11-07 18:02:43,662][03016] Updated weights for policy 0, policy_version 460 (0.0022) |
| [2024-11-07 18:02:45,067][00281] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1888256. Throughput: 0: 829.0. Samples: 471474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:02:45,076][00281] Avg episode reward: [(0, '6.435')] |
| [2024-11-07 18:02:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 1904640. Throughput: 0: 878.0. Samples: 477214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:02:50,073][00281] Avg episode reward: [(0, '6.625')] |
| [2024-11-07 18:02:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1916928. Throughput: 0: 850.3. Samples: 478914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:02:55,072][00281] Avg episode reward: [(0, '6.462')] |
| [2024-11-07 18:02:56,803][03016] Updated weights for policy 0, policy_version 470 (0.0031) |
| [2024-11-07 18:03:00,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1937408. Throughput: 0: 820.0. Samples: 483784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:03:00,073][00281] Avg episode reward: [(0, '7.865')] |
| [2024-11-07 18:03:00,084][03002] Saving new best policy, reward=7.865! |
| [2024-11-07 18:03:05,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 1957888. Throughput: 0: 864.4. Samples: 489796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:03:05,074][00281] Avg episode reward: [(0, '6.252')] |
| [2024-11-07 18:03:07,493][03016] Updated weights for policy 0, policy_version 480 (0.0035) |
| [2024-11-07 18:03:10,068][00281] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3262.9). Total num frames: 1970176. Throughput: 0: 867.6. Samples: 492018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:03:10,071][00281] Avg episode reward: [(0, '6.927')] |
| [2024-11-07 18:03:15,068][00281] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 1986560. Throughput: 0: 821.4. Samples: 496054. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 18:03:15,073][00281] Avg episode reward: [(0, '6.090')] |
| [2024-11-07 18:03:19,704][03016] Updated weights for policy 0, policy_version 490 (0.0016) |
| [2024-11-07 18:03:20,067][00281] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 2007040. Throughput: 0: 848.9. Samples: 502168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:03:20,069][00281] Avg episode reward: [(0, '6.491')] |
| [2024-11-07 18:03:25,067][00281] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2023424. Throughput: 0: 878.3. Samples: 505240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:03:25,070][00281] Avg episode reward: [(0, '6.935')] |
| [2024-11-07 18:03:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2035712. Throughput: 0: 830.1. Samples: 508828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:03:30,076][00281] Avg episode reward: [(0, '5.571')] |
| [2024-11-07 18:03:32,386][03016] Updated weights for policy 0, policy_version 500 (0.0026) |
| [2024-11-07 18:03:35,067][00281] Fps is (10 sec: 3277.0, 60 sec: 3413.4, 300 sec: 3304.6). Total num frames: 2056192. Throughput: 0: 832.1. Samples: 514660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:03:35,074][00281] Avg episode reward: [(0, '5.801')] |
| [2024-11-07 18:03:40,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 2076672. Throughput: 0: 863.3. Samples: 517762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:03:40,073][00281] Avg episode reward: [(0, '6.208')] |
| [2024-11-07 18:03:45,021][03016] Updated weights for policy 0, policy_version 510 (0.0017) |
| [2024-11-07 18:03:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.8). Total num frames: 2088960. Throughput: 0: 849.5. Samples: 522010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:03:45,075][00281] Avg episode reward: [(0, '5.659')] |
| [2024-11-07 18:03:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2105344. Throughput: 0: 827.9. Samples: 527050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) |
| [2024-11-07 18:03:50,069][00281] Avg episode reward: [(0, '5.965')] |
| [2024-11-07 18:03:55,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3318.5). Total num frames: 2125824. Throughput: 0: 845.1. Samples: 530044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:03:55,069][00281] Avg episode reward: [(0, '6.629')] |
| [2024-11-07 18:03:55,495][03016] Updated weights for policy 0, policy_version 520 (0.0030) |
| [2024-11-07 18:04:00,070][00281] Fps is (10 sec: 3275.7, 60 sec: 3344.9, 300 sec: 3290.6). Total num frames: 2138112. Throughput: 0: 870.2. Samples: 535214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:04:00,074][00281] Avg episode reward: [(0, '7.086')] |
| [2024-11-07 18:04:05,068][00281] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 2154496. Throughput: 0: 829.2. Samples: 539484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:04:05,073][00281] Avg episode reward: [(0, '6.734')] |
| [2024-11-07 18:04:08,428][03016] Updated weights for policy 0, policy_version 530 (0.0033) |
| [2024-11-07 18:04:10,067][00281] Fps is (10 sec: 3687.6, 60 sec: 3413.4, 300 sec: 3304.6). Total num frames: 2174976. Throughput: 0: 829.7. Samples: 542574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) |
| [2024-11-07 18:04:10,071][00281] Avg episode reward: [(0, '8.119')] |
| [2024-11-07 18:04:10,079][03002] Saving new best policy, reward=8.119! |
| [2024-11-07 18:04:15,070][00281] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3304.5). Total num frames: 2191360. Throughput: 0: 874.7. Samples: 548192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:04:15,072][00281] Avg episode reward: [(0, '6.317')] |
| [2024-11-07 18:04:20,068][00281] Fps is (10 sec: 2866.8, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 2203648. Throughput: 0: 818.6. Samples: 551496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:04:20,071][00281] Avg episode reward: [(0, '7.052')] |
| [2024-11-07 18:04:20,080][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000538_2203648.pth... |
| [2024-11-07 18:04:20,227][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_1413120.pth |
| [2024-11-07 18:04:21,856][03016] Updated weights for policy 0, policy_version 540 (0.0037) |
| [2024-11-07 18:04:25,067][00281] Fps is (10 sec: 2868.1, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 2220032. Throughput: 0: 809.4. Samples: 554184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:04:25,073][00281] Avg episode reward: [(0, '5.885')] |
| [2024-11-07 18:04:30,067][00281] Fps is (10 sec: 3686.9, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 2240512. Throughput: 0: 840.8. Samples: 559848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:04:30,071][00281] Avg episode reward: [(0, '6.998')] |
| [2024-11-07 18:04:34,012][03016] Updated weights for policy 0, policy_version 550 (0.0020) |
| [2024-11-07 18:04:35,069][00281] Fps is (10 sec: 3276.1, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 2252800. Throughput: 0: 821.2. Samples: 564006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:04:35,073][00281] Avg episode reward: [(0, '6.398')] |
| [2024-11-07 18:04:40,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2269184. Throughput: 0: 799.5. Samples: 566020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 18:04:40,074][00281] Avg episode reward: [(0, '5.966')] |
| [2024-11-07 18:04:45,067][00281] Fps is (10 sec: 3687.1, 60 sec: 3345.0, 300 sec: 3304.6). Total num frames: 2289664. Throughput: 0: 810.4. Samples: 571680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:04:45,074][00281] Avg episode reward: [(0, '7.404')] |
| [2024-11-07 18:04:45,816][03016] Updated weights for policy 0, policy_version 560 (0.0019) |
| [2024-11-07 18:04:50,068][00281] Fps is (10 sec: 3276.6, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 2301952. Throughput: 0: 827.2. Samples: 576708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:04:50,074][00281] Avg episode reward: [(0, '6.343')] |
| [2024-11-07 18:04:55,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2318336. Throughput: 0: 797.2. Samples: 578446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:04:55,071][00281] Avg episode reward: [(0, '6.418')] |
| [2024-11-07 18:04:58,775][03016] Updated weights for policy 0, policy_version 570 (0.0018) |
| [2024-11-07 18:05:00,067][00281] Fps is (10 sec: 3686.8, 60 sec: 3345.2, 300 sec: 3304.6). Total num frames: 2338816. Throughput: 0: 795.7. Samples: 583998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:00,074][00281] Avg episode reward: [(0, '6.027')] |
| [2024-11-07 18:05:05,067][00281] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2355200. Throughput: 0: 849.4. Samples: 589718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:05,071][00281] Avg episode reward: [(0, '6.631')] |
| [2024-11-07 18:05:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2367488. Throughput: 0: 826.7. Samples: 591384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:10,068][00281] Avg episode reward: [(0, '6.380')] |
| [2024-11-07 18:05:12,124][03016] Updated weights for policy 0, policy_version 580 (0.0033) |
| [2024-11-07 18:05:15,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3208.7, 300 sec: 3304.6). Total num frames: 2383872. Throughput: 0: 802.0. Samples: 595936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:15,074][00281] Avg episode reward: [(0, '5.926')] |
| [2024-11-07 18:05:20,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2404352. Throughput: 0: 842.0. Samples: 601894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:05:20,073][00281] Avg episode reward: [(0, '5.925')] |
| [2024-11-07 18:05:23,956][03016] Updated weights for policy 0, policy_version 590 (0.0030) |
| [2024-11-07 18:05:25,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2416640. Throughput: 0: 843.2. Samples: 603964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:25,072][00281] Avg episode reward: [(0, '5.737')] |
| [2024-11-07 18:05:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2433024. Throughput: 0: 801.5. Samples: 607746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:30,072][00281] Avg episode reward: [(0, '6.238')] |
| [2024-11-07 18:05:35,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3304.6). Total num frames: 2453504. Throughput: 0: 823.0. Samples: 613744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:35,075][00281] Avg episode reward: [(0, '5.967')] |
| [2024-11-07 18:05:35,895][03016] Updated weights for policy 0, policy_version 600 (0.0037) |
| [2024-11-07 18:05:40,069][00281] Fps is (10 sec: 3685.5, 60 sec: 3345.0, 300 sec: 3290.7). Total num frames: 2469888. Throughput: 0: 850.0. Samples: 616696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:05:40,071][00281] Avg episode reward: [(0, '7.711')] |
| [2024-11-07 18:05:45,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2482176. Throughput: 0: 806.0. Samples: 620268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:05:45,073][00281] Avg episode reward: [(0, '6.727')] |
| [2024-11-07 18:05:49,055][03016] Updated weights for policy 0, policy_version 610 (0.0025) |
| [2024-11-07 18:05:50,067][00281] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 2498560. Throughput: 0: 799.4. Samples: 625690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:05:50,069][00281] Avg episode reward: [(0, '7.309')] |
| [2024-11-07 18:05:55,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2519040. Throughput: 0: 828.2. Samples: 628652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:05:55,069][00281] Avg episode reward: [(0, '7.492')] |
| [2024-11-07 18:06:00,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 2531328. Throughput: 0: 829.7. Samples: 633272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:06:00,071][00281] Avg episode reward: [(0, '7.174')] |
| [2024-11-07 18:06:02,106][03016] Updated weights for policy 0, policy_version 620 (0.0032) |
| [2024-11-07 18:06:05,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2547712. Throughput: 0: 802.3. Samples: 637998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:06:05,073][00281] Avg episode reward: [(0, '8.016')] |
| [2024-11-07 18:06:10,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 2572288. Throughput: 0: 825.9. Samples: 641128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:06:10,072][00281] Avg episode reward: [(0, '6.513')] |
| [2024-11-07 18:06:12,351][03016] Updated weights for policy 0, policy_version 630 (0.0026) |
| [2024-11-07 18:06:15,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2584576. Throughput: 0: 859.3. Samples: 646414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:06:15,073][00281] Avg episode reward: [(0, '6.580')] |
| [2024-11-07 18:06:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 2600960. Throughput: 0: 815.9. Samples: 650460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:06:20,074][00281] Avg episode reward: [(0, '6.416')] |
| [2024-11-07 18:06:20,090][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth... |
| [2024-11-07 18:06:20,215][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth |
| [2024-11-07 18:06:24,993][03016] Updated weights for policy 0, policy_version 640 (0.0031) |
| [2024-11-07 18:06:25,067][00281] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3346.3). Total num frames: 2621440. Throughput: 0: 814.7. Samples: 653358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) |
| [2024-11-07 18:06:25,075][00281] Avg episode reward: [(0, '8.118')] |
| [2024-11-07 18:06:30,068][00281] Fps is (10 sec: 3686.1, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 2637824. Throughput: 0: 870.0. Samples: 659418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:06:30,070][00281] Avg episode reward: [(0, '6.139')] |
| [2024-11-07 18:06:35,067][00281] Fps is (10 sec: 2457.7, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2646016. Throughput: 0: 824.4. Samples: 662790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:06:35,073][00281] Avg episode reward: [(0, '6.812')] |
| [2024-11-07 18:06:38,495][03016] Updated weights for policy 0, policy_version 650 (0.0047) |
| [2024-11-07 18:06:40,067][00281] Fps is (10 sec: 2867.4, 60 sec: 3276.9, 300 sec: 3332.3). Total num frames: 2666496. Throughput: 0: 813.3. Samples: 665250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:06:40,074][00281] Avg episode reward: [(0, '6.825')] |
| [2024-11-07 18:06:45,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 2686976. Throughput: 0: 844.2. Samples: 671260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:06:45,069][00281] Avg episode reward: [(0, '6.721')] |
| [2024-11-07 18:06:50,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 2699264. Throughput: 0: 833.4. Samples: 675500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:06:50,070][00281] Avg episode reward: [(0, '6.722')] |
| [2024-11-07 18:06:51,185][03016] Updated weights for policy 0, policy_version 660 (0.0030) |
| [2024-11-07 18:06:55,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2711552. Throughput: 0: 800.8. Samples: 677162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:06:55,073][00281] Avg episode reward: [(0, '6.769')] |
| [2024-11-07 18:07:00,069][00281] Fps is (10 sec: 3276.2, 60 sec: 3345.0, 300 sec: 3332.3). Total num frames: 2732032. Throughput: 0: 813.9. Samples: 683040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:07:00,071][00281] Avg episode reward: [(0, '6.534')] |
| [2024-11-07 18:07:02,146][03016] Updated weights for policy 0, policy_version 670 (0.0014) |
| [2024-11-07 18:07:05,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 2752512. Throughput: 0: 845.2. Samples: 688492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:07:05,069][00281] Avg episode reward: [(0, '7.067')] |
| [2024-11-07 18:07:10,067][00281] Fps is (10 sec: 2867.7, 60 sec: 3140.3, 300 sec: 3290.7). Total num frames: 2760704. Throughput: 0: 818.4. Samples: 690184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:07:10,071][00281] Avg episode reward: [(0, '7.947')] |
| [2024-11-07 18:07:15,029][03016] Updated weights for policy 0, policy_version 680 (0.0030) |
| [2024-11-07 18:07:15,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 2785280. Throughput: 0: 798.3. Samples: 695340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:07:15,074][00281] Avg episode reward: [(0, '6.146')] |
| [2024-11-07 18:07:20,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 2801664. Throughput: 0: 859.8. Samples: 701482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:07:20,069][00281] Avg episode reward: [(0, '6.033')] |
| [2024-11-07 18:07:25,067][00281] Fps is (10 sec: 2867.0, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2813952. Throughput: 0: 846.4. Samples: 703338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:07:25,072][00281] Avg episode reward: [(0, '7.557')] |
| [2024-11-07 18:07:27,978][03016] Updated weights for policy 0, policy_version 690 (0.0027) |
| [2024-11-07 18:07:30,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2834432. Throughput: 0: 810.9. Samples: 707750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:07:30,074][00281] Avg episode reward: [(0, '8.034')] |
| [2024-11-07 18:07:35,067][00281] Fps is (10 sec: 4096.3, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 2854912. Throughput: 0: 851.9. Samples: 713834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:07:35,069][00281] Avg episode reward: [(0, '7.520')] |
| [2024-11-07 18:07:38,572][03016] Updated weights for policy 0, policy_version 700 (0.0030) |
| [2024-11-07 18:07:40,067][00281] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 3318.4). Total num frames: 2867200. Throughput: 0: 878.3. Samples: 716686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 18:07:40,070][00281] Avg episode reward: [(0, '7.541')] |
| [2024-11-07 18:07:45,068][00281] Fps is (10 sec: 2457.3, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2879488. Throughput: 0: 822.2. Samples: 720038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:07:45,076][00281] Avg episode reward: [(0, '8.198')] |
| [2024-11-07 18:07:45,078][03002] Saving new best policy, reward=8.198! |
| [2024-11-07 18:07:50,067][00281] Fps is (10 sec: 2457.8, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2891776. Throughput: 0: 778.1. Samples: 723506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:07:50,073][00281] Avg episode reward: [(0, '7.468')] |
| [2024-11-07 18:07:53,789][03016] Updated weights for policy 0, policy_version 710 (0.0025) |
| [2024-11-07 18:07:55,067][00281] Fps is (10 sec: 3277.2, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2912256. Throughput: 0: 805.4. Samples: 726428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:07:55,073][00281] Avg episode reward: [(0, '7.355')] |
| [2024-11-07 18:08:00,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3140.4, 300 sec: 3262.9). Total num frames: 2920448. Throughput: 0: 784.3. Samples: 730632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:08:00,075][00281] Avg episode reward: [(0, '8.277')] |
| [2024-11-07 18:08:00,090][03002] Saving new best policy, reward=8.277! |
| [2024-11-07 18:08:05,067][00281] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 3290.7). Total num frames: 2940928. Throughput: 0: 762.0. Samples: 735774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:08:05,074][00281] Avg episode reward: [(0, '7.450')] |
| [2024-11-07 18:08:06,611][03016] Updated weights for policy 0, policy_version 720 (0.0035) |
| [2024-11-07 18:08:10,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 2961408. Throughput: 0: 789.6. Samples: 738870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:08:10,072][00281] Avg episode reward: [(0, '7.903')] |
| [2024-11-07 18:08:15,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 2973696. Throughput: 0: 804.4. Samples: 743950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:08:15,069][00281] Avg episode reward: [(0, '7.661')] |
| [2024-11-07 18:08:19,203][03016] Updated weights for policy 0, policy_version 730 (0.0035) |
| [2024-11-07 18:08:20,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 2990080. Throughput: 0: 767.2. Samples: 748356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:08:20,069][00281] Avg episode reward: [(0, '7.802')] |
| [2024-11-07 18:08:20,080][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth... |
| [2024-11-07 18:08:20,231][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000538_2203648.pth |
| [2024-11-07 18:08:25,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3010560. Throughput: 0: 768.1. Samples: 751248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:08:25,074][00281] Avg episode reward: [(0, '8.856')] |
| [2024-11-07 18:08:25,076][03002] Saving new best policy, reward=8.856! |
| [2024-11-07 18:08:30,070][00281] Fps is (10 sec: 3685.1, 60 sec: 3208.3, 300 sec: 3290.6). Total num frames: 3026944. Throughput: 0: 822.9. Samples: 757072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:08:30,078][00281] Avg episode reward: [(0, '8.110')] |
| [2024-11-07 18:08:30,643][03016] Updated weights for policy 0, policy_version 740 (0.0027) |
| [2024-11-07 18:08:35,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3262.9). Total num frames: 3039232. Throughput: 0: 824.1. Samples: 760592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:08:35,069][00281] Avg episode reward: [(0, '9.144')] |
| [2024-11-07 18:08:35,076][03002] Saving new best policy, reward=9.144! |
| [2024-11-07 18:08:40,067][00281] Fps is (10 sec: 3277.9, 60 sec: 3208.6, 300 sec: 3290.7). Total num frames: 3059712. Throughput: 0: 825.9. Samples: 763594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:08:40,073][00281] Avg episode reward: [(0, '7.777')] |
| [2024-11-07 18:08:42,264][03016] Updated weights for policy 0, policy_version 750 (0.0030) |
| [2024-11-07 18:08:45,067][00281] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3080192. Throughput: 0: 870.6. Samples: 769810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:08:45,072][00281] Avg episode reward: [(0, '7.894')] |
| [2024-11-07 18:08:50,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3092480. Throughput: 0: 846.2. Samples: 773852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:08:50,072][00281] Avg episode reward: [(0, '7.247')] |
| [2024-11-07 18:08:55,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 3108864. Throughput: 0: 827.6. Samples: 776114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:08:55,074][00281] Avg episode reward: [(0, '7.653')] |
| [2024-11-07 18:08:55,357][03016] Updated weights for policy 0, policy_version 760 (0.0019) |
| [2024-11-07 18:09:00,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3304.6). Total num frames: 3129344. Throughput: 0: 843.1. Samples: 781888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:09:00,069][00281] Avg episode reward: [(0, '7.716')] |
| [2024-11-07 18:09:05,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3141632. Throughput: 0: 850.4. Samples: 786624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:09:05,075][00281] Avg episode reward: [(0, '9.175')] |
| [2024-11-07 18:09:05,077][03002] Saving new best policy, reward=9.175! |
| [2024-11-07 18:09:08,743][03016] Updated weights for policy 0, policy_version 770 (0.0026) |
| [2024-11-07 18:09:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3158016. Throughput: 0: 822.6. Samples: 788264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:09:10,069][00281] Avg episode reward: [(0, '9.159')] |
| [2024-11-07 18:09:15,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 3178496. Throughput: 0: 823.1. Samples: 794108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:09:15,073][00281] Avg episode reward: [(0, '10.285')] |
| [2024-11-07 18:09:15,079][03002] Saving new best policy, reward=10.285! |
| [2024-11-07 18:09:18,678][03016] Updated weights for policy 0, policy_version 780 (0.0025) |
| [2024-11-07 18:09:20,072][00281] Fps is (10 sec: 3684.6, 60 sec: 3413.1, 300 sec: 3304.5). Total num frames: 3194880. Throughput: 0: 869.7. Samples: 799732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:09:20,074][00281] Avg episode reward: [(0, '7.964')] |
| [2024-11-07 18:09:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3207168. Throughput: 0: 840.7. Samples: 801426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:09:25,073][00281] Avg episode reward: [(0, '8.455')] |
| [2024-11-07 18:09:30,067][00281] Fps is (10 sec: 3278.4, 60 sec: 3345.3, 300 sec: 3304.6). Total num frames: 3227648. Throughput: 0: 813.6. Samples: 806424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:09:30,074][00281] Avg episode reward: [(0, '10.130')] |
| [2024-11-07 18:09:31,559][03016] Updated weights for policy 0, policy_version 790 (0.0046) |
| [2024-11-07 18:09:35,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3318.5). Total num frames: 3248128. Throughput: 0: 858.5. Samples: 812486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:09:35,075][00281] Avg episode reward: [(0, '9.090')] |
| [2024-11-07 18:09:40,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3260416. Throughput: 0: 855.1. Samples: 814592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:09:40,071][00281] Avg episode reward: [(0, '8.139')] |
| [2024-11-07 18:09:44,518][03016] Updated weights for policy 0, policy_version 800 (0.0030) |
| [2024-11-07 18:09:45,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3276800. Throughput: 0: 820.0. Samples: 818788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:09:45,069][00281] Avg episode reward: [(0, '8.765')] |
| [2024-11-07 18:09:50,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3297280. Throughput: 0: 848.3. Samples: 824796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:09:50,069][00281] Avg episode reward: [(0, '8.068')] |
| [2024-11-07 18:09:55,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3309568. Throughput: 0: 872.5. Samples: 827526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:09:55,072][00281] Avg episode reward: [(0, '8.376')] |
| [2024-11-07 18:09:56,956][03016] Updated weights for policy 0, policy_version 810 (0.0014) |
| [2024-11-07 18:10:00,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 3325952. Throughput: 0: 817.8. Samples: 830910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:10:00,075][00281] Avg episode reward: [(0, '7.307')] |
| [2024-11-07 18:10:05,067][00281] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3346432. Throughput: 0: 828.0. Samples: 836990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:10:05,073][00281] Avg episode reward: [(0, '8.267')] |
| [2024-11-07 18:10:07,747][03016] Updated weights for policy 0, policy_version 820 (0.0036) |
| [2024-11-07 18:10:10,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3362816. Throughput: 0: 859.6. Samples: 840106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:10:10,072][00281] Avg episode reward: [(0, '8.461')] |
| [2024-11-07 18:10:15,068][00281] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 3375104. Throughput: 0: 842.5. Samples: 844336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:10:15,075][00281] Avg episode reward: [(0, '9.205')] |
| [2024-11-07 18:10:20,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.3, 300 sec: 3318.5). Total num frames: 3395584. Throughput: 0: 822.4. Samples: 849494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:10:20,072][00281] Avg episode reward: [(0, '8.223')] |
| [2024-11-07 18:10:20,084][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth... |
| [2024-11-07 18:10:20,233][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth |
| [2024-11-07 18:10:20,619][03016] Updated weights for policy 0, policy_version 830 (0.0039) |
| [2024-11-07 18:10:25,067][00281] Fps is (10 sec: 4096.4, 60 sec: 3481.6, 300 sec: 3332.3). Total num frames: 3416064. Throughput: 0: 842.6. Samples: 852510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:10:25,068][00281] Avg episode reward: [(0, '9.189')] |
| [2024-11-07 18:10:30,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 3428352. Throughput: 0: 859.2. Samples: 857452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:10:30,071][00281] Avg episode reward: [(0, '8.753')] |
| [2024-11-07 18:10:33,531][03016] Updated weights for policy 0, policy_version 840 (0.0015) |
| [2024-11-07 18:10:35,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3444736. Throughput: 0: 820.7. Samples: 861728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:10:35,071][00281] Avg episode reward: [(0, '8.998')] |
| [2024-11-07 18:10:40,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3465216. Throughput: 0: 828.7. Samples: 864816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:10:40,069][00281] Avg episode reward: [(0, '8.787')] |
| [2024-11-07 18:10:43,874][03016] Updated weights for policy 0, policy_version 850 (0.0023) |
| [2024-11-07 18:10:45,068][00281] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3332.3). Total num frames: 3481600. Throughput: 0: 886.3. Samples: 870794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:10:45,074][00281] Avg episode reward: [(0, '10.212')] |
| [2024-11-07 18:10:50,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 3493888. Throughput: 0: 831.1. Samples: 874390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:10:50,069][00281] Avg episode reward: [(0, '9.444')] |
| [2024-11-07 18:10:55,067][00281] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3514368. Throughput: 0: 827.6. Samples: 877350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:10:55,073][00281] Avg episode reward: [(0, '8.586')] |
| [2024-11-07 18:10:56,454][03016] Updated weights for policy 0, policy_version 860 (0.0017) |
| [2024-11-07 18:11:00,067][00281] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 3534848. Throughput: 0: 867.3. Samples: 883362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:11:00,073][00281] Avg episode reward: [(0, '8.344')] |
| [2024-11-07 18:11:05,067][00281] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 3304.6). Total num frames: 3547136. Throughput: 0: 845.2. Samples: 887528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:11:05,075][00281] Avg episode reward: [(0, '9.910')] |
| [2024-11-07 18:11:09,136][03016] Updated weights for policy 0, policy_version 870 (0.0024) |
| [2024-11-07 18:11:10,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3563520. Throughput: 0: 830.8. Samples: 889898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:11:10,075][00281] Avg episode reward: [(0, '9.638')] |
| [2024-11-07 18:11:15,067][00281] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3346.2). Total num frames: 3588096. Throughput: 0: 859.5. Samples: 896128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:11:15,069][00281] Avg episode reward: [(0, '9.854')] |
| [2024-11-07 18:11:20,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3600384. Throughput: 0: 873.7. Samples: 901046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:11:20,074][00281] Avg episode reward: [(0, '10.538')] |
| [2024-11-07 18:11:20,092][03002] Saving new best policy, reward=10.538! |
| [2024-11-07 18:11:20,848][03016] Updated weights for policy 0, policy_version 880 (0.0036) |
| [2024-11-07 18:11:25,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3616768. Throughput: 0: 842.5. Samples: 902728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:11:25,069][00281] Avg episode reward: [(0, '9.345')] |
| [2024-11-07 18:11:30,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 3637248. Throughput: 0: 842.1. Samples: 908688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:11:30,073][00281] Avg episode reward: [(0, '10.892')] |
| [2024-11-07 18:11:30,083][03002] Saving new best policy, reward=10.892! |
| [2024-11-07 18:11:31,713][03016] Updated weights for policy 0, policy_version 890 (0.0029) |
| [2024-11-07 18:11:35,067][00281] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 3653632. Throughput: 0: 886.0. Samples: 914258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:11:35,070][00281] Avg episode reward: [(0, '9.840')] |
| [2024-11-07 18:11:40,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3665920. Throughput: 0: 859.5. Samples: 916026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:11:40,073][00281] Avg episode reward: [(0, '9.183')] |
| [2024-11-07 18:11:44,326][03016] Updated weights for policy 0, policy_version 900 (0.0053) |
| [2024-11-07 18:11:45,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3346.2). Total num frames: 3686400. Throughput: 0: 843.6. Samples: 921326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:11:45,072][00281] Avg episode reward: [(0, '10.487')] |
| [2024-11-07 18:11:50,074][00281] Fps is (10 sec: 4093.0, 60 sec: 3549.4, 300 sec: 3373.9). Total num frames: 3706880. Throughput: 0: 891.4. Samples: 927646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:11:50,076][00281] Avg episode reward: [(0, '11.213')] |
| [2024-11-07 18:11:50,093][03002] Saving new best policy, reward=11.213! |
| [2024-11-07 18:11:55,067][00281] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 3719168. Throughput: 0: 882.2. Samples: 929596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:11:55,069][00281] Avg episode reward: [(0, '10.129')] |
| [2024-11-07 18:11:57,088][03016] Updated weights for policy 0, policy_version 910 (0.0028) |
| [2024-11-07 18:12:00,067][00281] Fps is (10 sec: 2869.3, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3735552. Throughput: 0: 839.8. Samples: 933920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:12:00,069][00281] Avg episode reward: [(0, '10.187')] |
| [2024-11-07 18:12:05,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3760128. Throughput: 0: 869.0. Samples: 940150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:12:05,069][00281] Avg episode reward: [(0, '10.387')] |
| [2024-11-07 18:12:06,986][03016] Updated weights for policy 0, policy_version 920 (0.0031) |
| [2024-11-07 18:12:10,068][00281] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3346.2). Total num frames: 3772416. Throughput: 0: 895.7. Samples: 943036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:12:10,071][00281] Avg episode reward: [(0, '9.954')] |
| [2024-11-07 18:12:15,067][00281] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 3788800. Throughput: 0: 842.4. Samples: 946594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:12:15,070][00281] Avg episode reward: [(0, '9.879')] |
| [2024-11-07 18:12:19,744][03016] Updated weights for policy 0, policy_version 930 (0.0037) |
| [2024-11-07 18:12:20,067][00281] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 3809280. Throughput: 0: 855.5. Samples: 952756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:12:20,073][00281] Avg episode reward: [(0, '9.862')] |
| [2024-11-07 18:12:20,085][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000930_3809280.pth... |
| [2024-11-07 18:12:20,210][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth |
| [2024-11-07 18:12:25,069][00281] Fps is (10 sec: 3685.7, 60 sec: 3481.5, 300 sec: 3360.1). Total num frames: 3825664. Throughput: 0: 883.4. Samples: 955782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:12:25,071][00281] Avg episode reward: [(0, '10.307')] |
| [2024-11-07 18:12:30,067][00281] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3837952. Throughput: 0: 857.7. Samples: 959924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) |
| [2024-11-07 18:12:30,073][00281] Avg episode reward: [(0, '10.182')] |
| [2024-11-07 18:12:32,544][03016] Updated weights for policy 0, policy_version 940 (0.0028) |
| [2024-11-07 18:12:35,067][00281] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3858432. Throughput: 0: 834.7. Samples: 965202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:12:35,075][00281] Avg episode reward: [(0, '10.538')] |
| [2024-11-07 18:12:40,067][00281] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 3878912. Throughput: 0: 860.4. Samples: 968316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:12:40,073][00281] Avg episode reward: [(0, '10.507')] |
| [2024-11-07 18:12:43,300][03016] Updated weights for policy 0, policy_version 950 (0.0028) |
| [2024-11-07 18:12:45,068][00281] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3401.7). Total num frames: 3895296. Throughput: 0: 879.3. Samples: 973488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) |
| [2024-11-07 18:12:45,077][00281] Avg episode reward: [(0, '10.706')] |
| [2024-11-07 18:12:50,067][00281] Fps is (10 sec: 3276.9, 60 sec: 3413.8, 300 sec: 3387.9). Total num frames: 3911680. Throughput: 0: 842.3. Samples: 978054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:12:50,069][00281] Avg episode reward: [(0, '12.098')] |
| [2024-11-07 18:12:50,077][03002] Saving new best policy, reward=12.098! |
| [2024-11-07 18:12:54,966][03016] Updated weights for policy 0, policy_version 960 (0.0036) |
| [2024-11-07 18:12:55,067][00281] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 3932160. Throughput: 0: 847.9. Samples: 981188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:12:55,069][00281] Avg episode reward: [(0, '9.185')] |
| [2024-11-07 18:13:00,068][00281] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3401.7). Total num frames: 3944448. Throughput: 0: 890.1. Samples: 986652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) |
| [2024-11-07 18:13:00,072][00281] Avg episode reward: [(0, '9.771')] |
| [2024-11-07 18:13:05,067][00281] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 3956736. Throughput: 0: 832.0. Samples: 990194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) |
| [2024-11-07 18:13:05,068][00281] Avg episode reward: [(0, '9.726')] |
| [2024-11-07 18:13:08,066][03016] Updated weights for policy 0, policy_version 970 (0.0029) |
| [2024-11-07 18:13:10,067][00281] Fps is (10 sec: 3687.0, 60 sec: 3481.7, 300 sec: 3415.6). Total num frames: 3981312. Throughput: 0: 835.8. Samples: 993392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) |
| [2024-11-07 18:13:10,074][00281] Avg episode reward: [(0, '10.562')] |
| [2024-11-07 18:13:15,069][00281] Fps is (10 sec: 4095.1, 60 sec: 3481.5, 300 sec: 3415.6). Total num frames: 3997696. Throughput: 0: 880.8. Samples: 999564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) |
| [2024-11-07 18:13:15,072][00281] Avg episode reward: [(0, '10.136')] |
| [2024-11-07 18:13:17,068][03002] Stopping Batcher_0... |
| [2024-11-07 18:13:17,069][03002] Loop batcher_evt_loop terminating... |
| [2024-11-07 18:13:17,069][00281] Component Batcher_0 stopped! |
| [2024-11-07 18:13:17,088][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... |
| [2024-11-07 18:13:17,166][03016] Weights refcount: 2 0 |
| [2024-11-07 18:13:17,180][03016] Stopping InferenceWorker_p0-w0... |
| [2024-11-07 18:13:17,181][03016] Loop inference_proc0-0_evt_loop terminating... |
| [2024-11-07 18:13:17,180][00281] Component InferenceWorker_p0-w0 stopped! |
| [2024-11-07 18:13:17,284][03002] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000829_3395584.pth |
| [2024-11-07 18:13:17,305][03002] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... |
| [2024-11-07 18:13:17,555][00281] Component LearnerWorker_p0 stopped! |
| [2024-11-07 18:13:17,561][03002] Stopping LearnerWorker_p0... |
| [2024-11-07 18:13:17,561][03002] Loop learner_proc0_evt_loop terminating... |
| [2024-11-07 18:13:17,750][00281] Component RolloutWorker_w6 stopped! |
| [2024-11-07 18:13:17,757][03023] Stopping RolloutWorker_w6... |
| [2024-11-07 18:13:17,758][03023] Loop rollout_proc6_evt_loop terminating... |
| [2024-11-07 18:13:17,766][03017] Stopping RolloutWorker_w1... |
| [2024-11-07 18:13:17,768][03017] Loop rollout_proc1_evt_loop terminating... |
| [2024-11-07 18:13:17,766][00281] Component RolloutWorker_w1 stopped! |
| [2024-11-07 18:13:17,777][03021] Stopping RolloutWorker_w5... |
| [2024-11-07 18:13:17,777][03021] Loop rollout_proc5_evt_loop terminating... |
| [2024-11-07 18:13:17,777][00281] Component RolloutWorker_w5 stopped! |
| [2024-11-07 18:13:17,791][03020] Stopping RolloutWorker_w4... |
| [2024-11-07 18:13:17,792][03022] Stopping RolloutWorker_w7... |
| [2024-11-07 18:13:17,792][03020] Loop rollout_proc4_evt_loop terminating... |
| [2024-11-07 18:13:17,792][03022] Loop rollout_proc7_evt_loop terminating... |
| [2024-11-07 18:13:17,793][00281] Component RolloutWorker_w4 stopped! |
| [2024-11-07 18:13:17,801][00281] Component RolloutWorker_w7 stopped! |
| [2024-11-07 18:13:17,822][03019] Stopping RolloutWorker_w3... |
| [2024-11-07 18:13:17,821][00281] Component RolloutWorker_w3 stopped! |
| [2024-11-07 18:13:17,831][03018] Stopping RolloutWorker_w2... |
| [2024-11-07 18:13:17,832][03018] Loop rollout_proc2_evt_loop terminating... |
| [2024-11-07 18:13:17,825][03019] Loop rollout_proc3_evt_loop terminating... |
| [2024-11-07 18:13:17,834][00281] Component RolloutWorker_w2 stopped! |
| [2024-11-07 18:13:17,847][03015] Stopping RolloutWorker_w0... |
| [2024-11-07 18:13:17,852][03015] Loop rollout_proc0_evt_loop terminating... |
| [2024-11-07 18:13:17,847][00281] Component RolloutWorker_w0 stopped! |
| [2024-11-07 18:13:17,854][00281] Waiting for process learner_proc0 to stop... |
| [2024-11-07 18:13:19,935][00281] Waiting for process inference_proc0-0 to join... |
| [2024-11-07 18:13:19,943][00281] Waiting for process rollout_proc0 to join... |
| [2024-11-07 18:13:21,975][00281] Waiting for process rollout_proc1 to join... |
| [2024-11-07 18:13:21,979][00281] Waiting for process rollout_proc2 to join... |
| [2024-11-07 18:13:21,985][00281] Waiting for process rollout_proc3 to join... |
| [2024-11-07 18:13:21,987][00281] Waiting for process rollout_proc4 to join... |
| [2024-11-07 18:13:21,991][00281] Waiting for process rollout_proc5 to join... |
| [2024-11-07 18:13:21,995][00281] Waiting for process rollout_proc6 to join... |
| [2024-11-07 18:13:21,998][00281] Waiting for process rollout_proc7 to join... |
| [2024-11-07 18:13:22,003][00281] Batcher 0 profile tree view: |
| batching: 28.1903, releasing_batches: 0.0297 |
| [2024-11-07 18:13:22,005][00281] InferenceWorker_p0-w0 profile tree view: |
| wait_policy: 0.0000 |
| wait_policy_total: 244.7297 |
| update_model: 9.6465 |
| weight_update: 0.0032 |
| one_step: 0.0153 |
| handle_policy_step: 930.3081 |
| deserialize: 16.9909, stack: 3.3934, obs_to_device_normalize: 140.6297, forward: 619.1030, send_messages: 33.0876 |
| prepare_outputs: 85.2778 |
| to_cpu: 51.9131 |
| [2024-11-07 18:13:22,007][00281] Learner 0 profile tree view: |
| misc: 0.0052, prepare_batch: 14.1065 |
| train: 84.2131 |
| epoch_init: 0.0146, minibatch_init: 0.0065, losses_postprocess: 0.7474, kl_divergence: 1.6880, after_optimizer: 35.0201 |
| calculate_losses: 32.2524 |
| losses_init: 0.0047, forward_head: 1.3498, bptt_initial: 21.1490, tail: 2.1789, advantages_returns: 0.3523, losses: 4.8248 |
| bptt: 2.0305 |
| bptt_forward_core: 1.9211 |
| update: 13.7950 |
| clip: 0.8865 |
| [2024-11-07 18:13:22,009][00281] RolloutWorker_w0 profile tree view: |
| wait_for_trajectories: 0.3816, enqueue_policy_requests: 55.7851, env_step: 946.9003, overhead: 14.5718, complete_rollouts: 7.4866 |
| save_policy_outputs: 21.4157 |
| split_output_tensors: 8.3712 |
| [2024-11-07 18:13:22,010][00281] RolloutWorker_w7 profile tree view: |
| wait_for_trajectories: 0.3854, enqueue_policy_requests: 59.3264, env_step: 949.8981, overhead: 15.1676, complete_rollouts: 7.5613 |
| save_policy_outputs: 21.7700 |
| split_output_tensors: 9.0049 |
| [2024-11-07 18:13:22,012][00281] Loop Runner_EvtLoop terminating... |
| [2024-11-07 18:13:22,013][00281] Runner profile tree view: |
| main_loop: 1259.6152 |
| [2024-11-07 18:13:22,014][00281] Collected {0: 4005888}, FPS: 3180.2 |
| [2024-11-07 18:13:22,054][00281] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json |
| [2024-11-07 18:13:22,056][00281] Overriding arg 'num_workers' with value 1 passed from command line |
| [2024-11-07 18:13:22,058][00281] Adding new argument 'no_render'=True that is not in the saved config file! |
| [2024-11-07 18:13:22,059][00281] Adding new argument 'save_video'=True that is not in the saved config file! |
| [2024-11-07 18:13:22,061][00281] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! |
| [2024-11-07 18:13:22,063][00281] Adding new argument 'video_name'=None that is not in the saved config file! |
| [2024-11-07 18:13:22,064][00281] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! |
| [2024-11-07 18:13:22,065][00281] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! |
| [2024-11-07 18:13:22,068][00281] Adding new argument 'push_to_hub'=False that is not in the saved config file! |
| [2024-11-07 18:13:22,069][00281] Adding new argument 'hf_repository'=None that is not in the saved config file! |
| [2024-11-07 18:13:22,070][00281] Adding new argument 'policy_index'=0 that is not in the saved config file! |
| [2024-11-07 18:13:22,073][00281] Adding new argument 'eval_deterministic'=False that is not in the saved config file! |
| [2024-11-07 18:13:22,074][00281] Adding new argument 'train_script'=None that is not in the saved config file! |
| [2024-11-07 18:13:22,076][00281] Adding new argument 'enjoy_script'=None that is not in the saved config file! |
| [2024-11-07 18:13:22,077][00281] Using frameskip 1 and render_action_repeat=4 for evaluation |
| [2024-11-07 18:13:22,112][00281] Doom resolution: 160x120, resize resolution: (128, 72) |
| [2024-11-07 18:13:22,117][00281] RunningMeanStd input shape: (3, 72, 128) |
| [2024-11-07 18:13:22,120][00281] RunningMeanStd input shape: (1,) |
| [2024-11-07 18:13:22,138][00281] ConvEncoder: input_channels=3 |
| [2024-11-07 18:13:22,263][00281] Conv encoder output size: 512 |
| [2024-11-07 18:13:22,264][00281] Policy head output size: 512 |
| [2024-11-07 18:13:22,436][00281] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... |
| [2024-11-07 18:13:23,262][00281] Num frames 100... |
| [2024-11-07 18:13:23,361][00281] Avg episode rewards: #0: 5.882, true rewards: #0: 5.882 |
| [2024-11-07 18:13:23,363][00281] Avg episode reward: 5.882, avg true_objective: 5.882 |
| [2024-11-07 18:13:23,468][00281] Num frames 200... |
| [2024-11-07 18:13:23,634][00281] Avg episode rewards: #0: 6.787, true rewards: #0: 6.787 |
| [2024-11-07 18:13:23,636][00281] Avg episode reward: 6.787, avg true_objective: 6.787 |
| [2024-11-07 18:13:23,680][00281] Num frames 300... |
| [2024-11-07 18:13:23,822][00281] Num frames 400... |
| [2024-11-07 18:13:23,922][00281] Avg episode rewards: #0: 7.140, true rewards: #0: 7.140 |
| [2024-11-07 18:13:23,923][00281] Avg episode reward: 7.140, avg true_objective: 7.140 |
| [2024-11-07 18:13:24,023][00281] Num frames 500... |
| [2024-11-07 18:13:24,111][00281] Avg episode rewards: #0: 6.399, true rewards: #0: 6.399 |
| [2024-11-07 18:13:24,113][00281] Avg episode reward: 6.399, avg true_objective: 6.399 |
| [2024-11-07 18:13:24,226][00281] Num frames 600... |
| [2024-11-07 18:13:24,298][00281] Avg episode rewards: #0: 5.915, true rewards: #0: 5.915 |
| [2024-11-07 18:13:24,299][00281] Avg episode reward: 5.915, avg true_objective: 5.915 |
| [2024-11-07 18:13:24,436][00281] Num frames 700... |
| [2024-11-07 18:13:24,589][00281] Num frames 800... |
| [2024-11-07 18:13:24,665][00281] Avg episode rewards: #0: 8.723, true rewards: #0: 8.723 |
| [2024-11-07 18:13:24,667][00281] Avg episode reward: 8.723, avg true_objective: 8.723 |
| [2024-11-07 18:13:24,787][00281] Num frames 900... |
| [2024-11-07 18:13:24,929][00281] Avg episode rewards: #0: 8.594, true rewards: #0: 8.594 |
| [2024-11-07 18:13:24,930][00281] Avg episode reward: 8.594, avg true_objective: 8.594 |
| [2024-11-07 18:13:24,989][00281] Num frames 1000... |
| [2024-11-07 18:13:25,163][00281] Avg episode rewards: #0: 8.454, true rewards: #0: 8.454 |
| [2024-11-07 18:13:25,164][00281] Avg episode reward: 8.454, avg true_objective: 8.454 |
| [2024-11-07 18:13:25,199][00281] Num frames 1100... |
| [2024-11-07 18:13:25,327][00281] Avg episode rewards: #0: 7.902, true rewards: #0: 7.902 |
| [2024-11-07 18:13:25,328][00281] Avg episode reward: 7.902, avg true_objective: 7.902 |
| [2024-11-07 18:13:25,401][00281] Num frames 1200... |
| [2024-11-07 18:13:25,506][00281] Avg episode rewards: #0: 7.485, true rewards: #0: 7.485 |
| [2024-11-07 18:13:25,507][00281] Avg episode reward: 7.485, avg true_objective: 7.485 |
| [2024-11-07 18:13:32,749][00281] Replay video saved to /content/train_dir/default_experiment/replay.mp4! |
| [2024-11-07 18:13:32,878][00281] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json |
| [2024-11-07 18:13:32,883][00281] Overriding arg 'num_workers' with value 1 passed from command line |
| [2024-11-07 18:13:32,885][00281] Adding new argument 'no_render'=True that is not in the saved config file! |
| [2024-11-07 18:13:32,887][00281] Adding new argument 'save_video'=True that is not in the saved config file! |
| [2024-11-07 18:13:32,888][00281] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! |
| [2024-11-07 18:13:32,894][00281] Adding new argument 'video_name'=None that is not in the saved config file! |
| [2024-11-07 18:13:32,895][00281] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! |
| [2024-11-07 18:13:32,897][00281] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! |
| [2024-11-07 18:13:32,898][00281] Adding new argument 'push_to_hub'=True that is not in the saved config file! |
| [2024-11-07 18:13:32,899][00281] Adding new argument 'hf_repository'='drap/rl_course_vizdoom' that is not in the saved config file! |
| [2024-11-07 18:13:32,900][00281] Adding new argument 'policy_index'=0 that is not in the saved config file! |
| [2024-11-07 18:13:32,905][00281] Adding new argument 'eval_deterministic'=False that is not in the saved config file! |
| [2024-11-07 18:13:32,906][00281] Adding new argument 'train_script'=None that is not in the saved config file! |
| [2024-11-07 18:13:32,907][00281] Adding new argument 'enjoy_script'=None that is not in the saved config file! |
| [2024-11-07 18:13:32,908][00281] Using frameskip 1 and render_action_repeat=4 for evaluation |
| [2024-11-07 18:13:32,950][00281] RunningMeanStd input shape: (3, 72, 128) |
| [2024-11-07 18:13:32,953][00281] RunningMeanStd input shape: (1,) |
| [2024-11-07 18:13:32,970][00281] ConvEncoder: input_channels=3 |
| [2024-11-07 18:13:33,027][00281] Conv encoder output size: 512 |
| [2024-11-07 18:13:33,029][00281] Policy head output size: 512 |
| [2024-11-07 18:13:33,057][00281] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... |
| [2024-11-07 18:13:33,637][00281] Num frames 100... |
| [2024-11-07 18:13:33,783][00281] Num frames 200... |
| [2024-11-07 18:13:33,843][00281] Avg episode rewards: #0: 22.802, true rewards: #0: 22.802 |
| [2024-11-07 18:13:33,844][00281] Avg episode reward: 22.802, avg true_objective: 22.802 |
| [2024-11-07 18:13:34,021][00281] Avg episode rewards: #0: 12.983, true rewards: #0: 12.983 |
| [2024-11-07 18:13:34,024][00281] Avg episode reward: 12.983, avg true_objective: 12.983 |
| [2024-11-07 18:13:34,052][00281] Num frames 300... |
| [2024-11-07 18:13:34,228][00281] Num frames 400... |
| [2024-11-07 18:13:34,350][00281] Avg episode rewards: #0: 11.833, true rewards: #0: 11.833 |
| [2024-11-07 18:13:34,352][00281] Avg episode reward: 11.833, avg true_objective: 11.833 |
| [2024-11-07 18:13:34,443][00281] Num frames 500... |
| [2024-11-07 18:13:34,591][00281] Avg episode rewards: #0: 10.469, true rewards: #0: 10.469 |
| [2024-11-07 18:13:34,595][00281] Avg episode reward: 10.469, avg true_objective: 10.469 |
| [2024-11-07 18:13:34,644][00281] Num frames 600... |
| [2024-11-07 18:13:34,790][00281] Avg episode rewards: #0: 9.259, true rewards: #0: 9.259 |
| [2024-11-07 18:13:34,791][00281] Avg episode reward: 9.259, avg true_objective: 9.259 |
| [2024-11-07 18:13:34,843][00281] Num frames 700... |
| [2024-11-07 18:13:35,032][00281] Avg episode rewards: #0: 8.889, true rewards: #0: 8.889 |
| [2024-11-07 18:13:35,033][00281] Avg episode reward: 8.889, avg true_objective: 8.889 |
| [2024-11-07 18:13:35,056][00281] Num frames 800... |
| [2024-11-07 18:13:35,208][00281] Num frames 900... |
| [2024-11-07 18:13:35,334][00281] Avg episode rewards: #0: 8.872, true rewards: #0: 8.872 |
| [2024-11-07 18:13:35,337][00281] Avg episode reward: 8.872, avg true_objective: 8.872 |
| [2024-11-07 18:13:35,408][00281] Num frames 1000... |
| [2024-11-07 18:13:35,553][00281] Num frames 1100... |
| [2024-11-07 18:13:35,677][00281] Avg episode rewards: #0: 10.609, true rewards: #0: 10.609 |
| [2024-11-07 18:13:35,678][00281] Avg episode reward: 10.609, avg true_objective: 10.609 |
| [2024-11-07 18:13:35,763][00281] Num frames 1200... |
| [2024-11-07 18:13:35,907][00281] Num frames 1300... |
| [2024-11-07 18:13:36,033][00281] Avg episode rewards: #0: 11.967, true rewards: #0: 11.967 |
| [2024-11-07 18:13:36,036][00281] Avg episode reward: 11.967, avg true_objective: 11.967 |
| [2024-11-07 18:13:36,143][00281] Num frames 1400... |
| [2024-11-07 18:13:36,289][00281] Num frames 1500... |
| [2024-11-07 18:13:36,398][00281] Avg episode rewards: #0: 13.046, true rewards: #0: 13.046 |
| [2024-11-07 18:13:36,401][00281] Avg episode reward: 13.046, avg true_objective: 13.046 |
| [2024-11-07 18:13:43,874][00281] Replay video saved to /content/train_dir/default_experiment/replay.mp4! |
|
|