Commit ·
3824c40
1
Parent(s): 93f425e
Upload . with huggingface_hub
Browse files- .gitattributes +1 -0
- .summary/0/events.out.tfevents.1677101945.355362e7601a +3 -0
- README.md +56 -0
- checkpoint_p0/best_000002275_9318400_reward_17.231.pth +3 -0
- checkpoint_p0/checkpoint_000002372_9715712.pth +3 -0
- checkpoint_p0/checkpoint_000002443_10006528.pth +3 -0
- config.json +143 -0
- replay.mp4 +3 -0
- sf_log.txt +487 -0
.gitattributes
CHANGED
|
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 32 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 32 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 33 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
replay.mp4 filter=lfs diff=lfs merge=lfs -text
|
.summary/0/events.out.tfevents.1677101945.355362e7601a
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7b1da7f33290987b826292b695f615948309a3a82fb5c743e63b902c5bd7ba02
|
| 3 |
+
size 2302815
|
README.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: sample-factory
|
| 3 |
+
tags:
|
| 4 |
+
- deep-reinforcement-learning
|
| 5 |
+
- reinforcement-learning
|
| 6 |
+
- sample-factory
|
| 7 |
+
model-index:
|
| 8 |
+
- name: APPO
|
| 9 |
+
results:
|
| 10 |
+
- task:
|
| 11 |
+
type: reinforcement-learning
|
| 12 |
+
name: reinforcement-learning
|
| 13 |
+
dataset:
|
| 14 |
+
name: doom_deadly_corridor
|
| 15 |
+
type: doom_deadly_corridor
|
| 16 |
+
metrics:
|
| 17 |
+
- type: mean_reward
|
| 18 |
+
value: 10.42 +/- 8.36
|
| 19 |
+
name: mean_reward
|
| 20 |
+
verified: false
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
A(n) **APPO** model trained on the **doom_deadly_corridor** environment.
|
| 24 |
+
|
| 25 |
+
This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory.
|
| 26 |
+
Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
## Downloading the model
|
| 30 |
+
|
| 31 |
+
After installing Sample-Factory, download the model with:
|
| 32 |
+
```
|
| 33 |
+
python -m sample_factory.huggingface.load_from_hub -r RamonAnkersmit/rl_course_doom_deadly_corridor
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
## Using the model
|
| 38 |
+
|
| 39 |
+
To run the model after download, use the `enjoy` script corresponding to this environment:
|
| 40 |
+
```
|
| 41 |
+
python -m <path.to.enjoy.module> --algo=APPO --env=doom_deadly_corridor --train_dir=./train_dir --experiment=rl_course_doom_deadly_corridor
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
|
| 45 |
+
You can also upload models to the Hugging Face Hub using the same script with the `--push_to_hub` flag.
|
| 46 |
+
See https://www.samplefactory.dev/10-huggingface/huggingface/ for more details
|
| 47 |
+
|
| 48 |
+
## Training with this model
|
| 49 |
+
|
| 50 |
+
To continue training with this model, use the `train` script corresponding to this environment:
|
| 51 |
+
```
|
| 52 |
+
python -m <path.to.train.module> --algo=APPO --env=doom_deadly_corridor --train_dir=./train_dir --experiment=rl_course_doom_deadly_corridor --restart_behavior=resume --train_for_env_steps=10000000000
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
Note, you may have to adjust `--train_for_env_steps` to a suitably high number as the experiment will resume at the number of steps it concluded at.
|
| 56 |
+
|
checkpoint_p0/best_000002275_9318400_reward_17.231.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d1d0346b43af0479ecef1f1dd45a431e8162972561fe0a5b0e2fc2e5c5b56a76
|
| 3 |
+
size 34965478
|
checkpoint_p0/checkpoint_000002372_9715712.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:26e3561611732efd9a991633aad7d40c33142b682a8292f355e3b3f84218ea48
|
| 3 |
+
size 34965892
|
checkpoint_p0/checkpoint_000002443_10006528.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0dea399edf996b131058ec439aceb24067a3f8da11e27099113f9215a0e02fa5
|
| 3 |
+
size 34965892
|
config.json
ADDED
|
@@ -0,0 +1,143 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"help": false,
|
| 3 |
+
"algo": "APPO",
|
| 4 |
+
"env": "doom_deadly_corridor",
|
| 5 |
+
"experiment": "doom_deadly_corridor",
|
| 6 |
+
"train_dir": "/content/train_dir",
|
| 7 |
+
"restart_behavior": "resume",
|
| 8 |
+
"device": "gpu",
|
| 9 |
+
"seed": null,
|
| 10 |
+
"num_policies": 1,
|
| 11 |
+
"async_rl": true,
|
| 12 |
+
"serial_mode": false,
|
| 13 |
+
"batched_sampling": false,
|
| 14 |
+
"num_batches_to_accumulate": 2,
|
| 15 |
+
"worker_num_splits": 2,
|
| 16 |
+
"policy_workers_per_policy": 1,
|
| 17 |
+
"max_policy_lag": 1000,
|
| 18 |
+
"num_workers": 8,
|
| 19 |
+
"num_envs_per_worker": 4,
|
| 20 |
+
"batch_size": 1024,
|
| 21 |
+
"num_batches_per_epoch": 1,
|
| 22 |
+
"num_epochs": 1,
|
| 23 |
+
"rollout": 32,
|
| 24 |
+
"recurrence": 32,
|
| 25 |
+
"shuffle_minibatches": false,
|
| 26 |
+
"gamma": 0.99,
|
| 27 |
+
"reward_scale": 1.0,
|
| 28 |
+
"reward_clip": 1000.0,
|
| 29 |
+
"value_bootstrap": false,
|
| 30 |
+
"normalize_returns": true,
|
| 31 |
+
"exploration_loss_coeff": 0.001,
|
| 32 |
+
"value_loss_coeff": 0.5,
|
| 33 |
+
"kl_loss_coeff": 0.0,
|
| 34 |
+
"exploration_loss": "symmetric_kl",
|
| 35 |
+
"gae_lambda": 0.95,
|
| 36 |
+
"ppo_clip_ratio": 0.1,
|
| 37 |
+
"ppo_clip_value": 0.2,
|
| 38 |
+
"with_vtrace": false,
|
| 39 |
+
"vtrace_rho": 1.0,
|
| 40 |
+
"vtrace_c": 1.0,
|
| 41 |
+
"optimizer": "adam",
|
| 42 |
+
"adam_eps": 1e-06,
|
| 43 |
+
"adam_beta1": 0.9,
|
| 44 |
+
"adam_beta2": 0.999,
|
| 45 |
+
"max_grad_norm": 4.0,
|
| 46 |
+
"learning_rate": 0.0001,
|
| 47 |
+
"lr_schedule": "constant",
|
| 48 |
+
"lr_schedule_kl_threshold": 0.008,
|
| 49 |
+
"lr_adaptive_min": 1e-06,
|
| 50 |
+
"lr_adaptive_max": 0.01,
|
| 51 |
+
"obs_subtract_mean": 0.0,
|
| 52 |
+
"obs_scale": 255.0,
|
| 53 |
+
"normalize_input": true,
|
| 54 |
+
"normalize_input_keys": null,
|
| 55 |
+
"decorrelate_experience_max_seconds": 0,
|
| 56 |
+
"decorrelate_envs_on_one_worker": true,
|
| 57 |
+
"actor_worker_gpus": [],
|
| 58 |
+
"set_workers_cpu_affinity": true,
|
| 59 |
+
"force_envs_single_thread": false,
|
| 60 |
+
"default_niceness": 0,
|
| 61 |
+
"log_to_file": true,
|
| 62 |
+
"experiment_summaries_interval": 10,
|
| 63 |
+
"flush_summaries_interval": 30,
|
| 64 |
+
"stats_avg": 100,
|
| 65 |
+
"summaries_use_frameskip": true,
|
| 66 |
+
"heartbeat_interval": 20,
|
| 67 |
+
"heartbeat_reporting_interval": 600,
|
| 68 |
+
"train_for_env_steps": 10000000,
|
| 69 |
+
"train_for_seconds": 10000000000,
|
| 70 |
+
"save_every_sec": 120,
|
| 71 |
+
"keep_checkpoints": 2,
|
| 72 |
+
"load_checkpoint_kind": "latest",
|
| 73 |
+
"save_milestones_sec": -1,
|
| 74 |
+
"save_best_every_sec": 5,
|
| 75 |
+
"save_best_metric": "reward",
|
| 76 |
+
"save_best_after": 100000,
|
| 77 |
+
"benchmark": false,
|
| 78 |
+
"encoder_mlp_layers": [
|
| 79 |
+
512,
|
| 80 |
+
512
|
| 81 |
+
],
|
| 82 |
+
"encoder_conv_architecture": "convnet_simple",
|
| 83 |
+
"encoder_conv_mlp_layers": [
|
| 84 |
+
512
|
| 85 |
+
],
|
| 86 |
+
"use_rnn": true,
|
| 87 |
+
"rnn_size": 512,
|
| 88 |
+
"rnn_type": "gru",
|
| 89 |
+
"rnn_num_layers": 1,
|
| 90 |
+
"decoder_mlp_layers": [],
|
| 91 |
+
"nonlinearity": "elu",
|
| 92 |
+
"policy_initialization": "orthogonal",
|
| 93 |
+
"policy_init_gain": 1.0,
|
| 94 |
+
"actor_critic_share_weights": true,
|
| 95 |
+
"adaptive_stddev": true,
|
| 96 |
+
"continuous_tanh_scale": 0.0,
|
| 97 |
+
"initial_stddev": 1.0,
|
| 98 |
+
"use_env_info_cache": false,
|
| 99 |
+
"env_gpu_actions": false,
|
| 100 |
+
"env_gpu_observations": true,
|
| 101 |
+
"env_frameskip": 4,
|
| 102 |
+
"env_framestack": 1,
|
| 103 |
+
"pixel_format": "CHW",
|
| 104 |
+
"use_record_episode_statistics": false,
|
| 105 |
+
"with_wandb": false,
|
| 106 |
+
"wandb_user": null,
|
| 107 |
+
"wandb_project": "sample_factory",
|
| 108 |
+
"wandb_group": null,
|
| 109 |
+
"wandb_job_type": "SF",
|
| 110 |
+
"wandb_tags": [],
|
| 111 |
+
"with_pbt": false,
|
| 112 |
+
"pbt_mix_policies_in_one_env": true,
|
| 113 |
+
"pbt_period_env_steps": 5000000,
|
| 114 |
+
"pbt_start_mutation": 20000000,
|
| 115 |
+
"pbt_replace_fraction": 0.3,
|
| 116 |
+
"pbt_mutation_rate": 0.15,
|
| 117 |
+
"pbt_replace_reward_gap": 0.1,
|
| 118 |
+
"pbt_replace_reward_gap_absolute": 1e-06,
|
| 119 |
+
"pbt_optimize_gamma": false,
|
| 120 |
+
"pbt_target_objective": "true_objective",
|
| 121 |
+
"pbt_perturb_min": 1.1,
|
| 122 |
+
"pbt_perturb_max": 1.5,
|
| 123 |
+
"num_agents": -1,
|
| 124 |
+
"num_humans": 0,
|
| 125 |
+
"num_bots": -1,
|
| 126 |
+
"start_bot_difficulty": null,
|
| 127 |
+
"timelimit": null,
|
| 128 |
+
"res_w": 128,
|
| 129 |
+
"res_h": 72,
|
| 130 |
+
"wide_aspect_ratio": false,
|
| 131 |
+
"eval_env_frameskip": 1,
|
| 132 |
+
"fps": 35,
|
| 133 |
+
"command_line": "--env=doom_deadly_corridor --experiment=doom_deadly_corridor --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000",
|
| 134 |
+
"cli_args": {
|
| 135 |
+
"env": "doom_deadly_corridor",
|
| 136 |
+
"experiment": "doom_deadly_corridor",
|
| 137 |
+
"num_workers": 8,
|
| 138 |
+
"num_envs_per_worker": 4,
|
| 139 |
+
"train_for_env_steps": 10000000
|
| 140 |
+
},
|
| 141 |
+
"git_hash": "unknown",
|
| 142 |
+
"git_repo_name": "not a git repository"
|
| 143 |
+
}
|
replay.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:12335cd0d24082e9d0eb7544fb2f9dcf472d9227bd7a464517d331d46654e4c5
|
| 3 |
+
size 1703429
|
sf_log.txt
ADDED
|
@@ -0,0 +1,487 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[2023-02-22 21:39:18,790][44343] Using GPUs [0] for process 0 (actually maps to GPUs [0])
|
| 2 |
+
[2023-02-22 21:39:18,795][44343] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
|
| 3 |
+
[2023-02-22 21:39:18,877][44343] Num visible devices: 1
|
| 4 |
+
[2023-02-22 21:39:18,911][44343] Starting seed is not provided
|
| 5 |
+
[2023-02-22 21:39:18,911][44343] Using GPUs [0] for process 0 (actually maps to GPUs [0])
|
| 6 |
+
[2023-02-22 21:39:18,911][44343] Initializing actor-critic model on device cuda:0
|
| 7 |
+
[2023-02-22 21:39:18,912][44343] RunningMeanStd input shape: (3, 72, 128)
|
| 8 |
+
[2023-02-22 21:39:18,916][44343] RunningMeanStd input shape: (1,)
|
| 9 |
+
[2023-02-22 21:39:18,966][44343] ConvEncoder: input_channels=3
|
| 10 |
+
[2023-02-22 21:39:19,495][44358] Worker 0 uses CPU cores [0]
|
| 11 |
+
[2023-02-22 21:39:19,655][44357] Using GPUs [0] for process 0 (actually maps to GPUs [0])
|
| 12 |
+
[2023-02-22 21:39:19,661][44357] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
|
| 13 |
+
[2023-02-22 21:39:19,700][44357] Num visible devices: 1
|
| 14 |
+
[2023-02-22 21:39:19,810][44359] Worker 1 uses CPU cores [1]
|
| 15 |
+
[2023-02-22 21:39:19,950][44343] Conv encoder output size: 512
|
| 16 |
+
[2023-02-22 21:39:19,950][44343] Policy head output size: 512
|
| 17 |
+
[2023-02-22 21:39:20,035][44343] Created Actor Critic model with architecture:
|
| 18 |
+
[2023-02-22 21:39:20,035][44343] ActorCriticSharedWeights(
|
| 19 |
+
(obs_normalizer): ObservationNormalizer(
|
| 20 |
+
(running_mean_std): RunningMeanStdDictInPlace(
|
| 21 |
+
(running_mean_std): ModuleDict(
|
| 22 |
+
(obs): RunningMeanStdInPlace()
|
| 23 |
+
)
|
| 24 |
+
)
|
| 25 |
+
)
|
| 26 |
+
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
|
| 27 |
+
(encoder): VizdoomEncoder(
|
| 28 |
+
(basic_encoder): ConvEncoder(
|
| 29 |
+
(enc): RecursiveScriptModule(
|
| 30 |
+
original_name=ConvEncoderImpl
|
| 31 |
+
(conv_head): RecursiveScriptModule(
|
| 32 |
+
original_name=Sequential
|
| 33 |
+
(0): RecursiveScriptModule(original_name=Conv2d)
|
| 34 |
+
(1): RecursiveScriptModule(original_name=ELU)
|
| 35 |
+
(2): RecursiveScriptModule(original_name=Conv2d)
|
| 36 |
+
(3): RecursiveScriptModule(original_name=ELU)
|
| 37 |
+
(4): RecursiveScriptModule(original_name=Conv2d)
|
| 38 |
+
(5): RecursiveScriptModule(original_name=ELU)
|
| 39 |
+
)
|
| 40 |
+
(mlp_layers): RecursiveScriptModule(
|
| 41 |
+
original_name=Sequential
|
| 42 |
+
(0): RecursiveScriptModule(original_name=Linear)
|
| 43 |
+
(1): RecursiveScriptModule(original_name=ELU)
|
| 44 |
+
)
|
| 45 |
+
)
|
| 46 |
+
)
|
| 47 |
+
)
|
| 48 |
+
(core): ModelCoreRNN(
|
| 49 |
+
(core): GRU(512, 512)
|
| 50 |
+
)
|
| 51 |
+
(decoder): MlpDecoder(
|
| 52 |
+
(mlp): Identity()
|
| 53 |
+
)
|
| 54 |
+
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
|
| 55 |
+
(action_parameterization): ActionParameterizationDefault(
|
| 56 |
+
(distribution_linear): Linear(in_features=512, out_features=11, bias=True)
|
| 57 |
+
)
|
| 58 |
+
)
|
| 59 |
+
[2023-02-22 21:39:20,560][44362] Worker 2 uses CPU cores [0]
|
| 60 |
+
[2023-02-22 21:39:20,816][44380] Worker 6 uses CPU cores [0]
|
| 61 |
+
[2023-02-22 21:39:20,864][44365] Worker 3 uses CPU cores [1]
|
| 62 |
+
[2023-02-22 21:39:21,072][44370] Worker 5 uses CPU cores [1]
|
| 63 |
+
[2023-02-22 21:39:21,112][44372] Worker 4 uses CPU cores [0]
|
| 64 |
+
[2023-02-22 21:39:21,139][44374] Worker 7 uses CPU cores [1]
|
| 65 |
+
[2023-02-22 21:39:27,142][44343] Using optimizer <class 'torch.optim.adam.Adam'>
|
| 66 |
+
[2023-02-22 21:39:27,144][44343] No checkpoints found
|
| 67 |
+
[2023-02-22 21:39:27,144][44343] Did not load from checkpoint, starting from scratch!
|
| 68 |
+
[2023-02-22 21:39:27,144][44343] Initialized policy 0 weights for model version 0
|
| 69 |
+
[2023-02-22 21:39:27,147][44343] Using GPUs [0] for process 0 (actually maps to GPUs [0])
|
| 70 |
+
[2023-02-22 21:39:27,154][44343] LearnerWorker_p0 finished initialization!
|
| 71 |
+
[2023-02-22 21:39:27,360][44357] RunningMeanStd input shape: (3, 72, 128)
|
| 72 |
+
[2023-02-22 21:39:27,361][44357] RunningMeanStd input shape: (1,)
|
| 73 |
+
[2023-02-22 21:39:27,373][44357] ConvEncoder: input_channels=3
|
| 74 |
+
[2023-02-22 21:39:27,471][44357] Conv encoder output size: 512
|
| 75 |
+
[2023-02-22 21:39:27,472][44357] Policy head output size: 512
|
| 76 |
+
[2023-02-22 21:39:30,323][44365] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 77 |
+
[2023-02-22 21:39:30,344][44370] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 78 |
+
[2023-02-22 21:39:30,348][44359] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 79 |
+
[2023-02-22 21:39:30,354][44374] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 80 |
+
[2023-02-22 21:39:30,502][44372] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 81 |
+
[2023-02-22 21:39:30,507][44380] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 82 |
+
[2023-02-22 21:39:30,535][44362] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 83 |
+
[2023-02-22 21:39:30,603][44358] Doom resolution: 160x120, resize resolution: (128, 72)
|
| 84 |
+
[2023-02-22 21:39:32,430][44374] Decorrelating experience for 0 frames...
|
| 85 |
+
[2023-02-22 21:39:32,432][44370] Decorrelating experience for 0 frames...
|
| 86 |
+
[2023-02-22 21:39:32,434][44365] Decorrelating experience for 0 frames...
|
| 87 |
+
[2023-02-22 21:39:32,434][44359] Decorrelating experience for 0 frames...
|
| 88 |
+
[2023-02-22 21:39:32,682][44372] Decorrelating experience for 0 frames...
|
| 89 |
+
[2023-02-22 21:39:32,687][44380] Decorrelating experience for 0 frames...
|
| 90 |
+
[2023-02-22 21:39:32,693][44362] Decorrelating experience for 0 frames...
|
| 91 |
+
[2023-02-22 21:39:32,710][44358] Decorrelating experience for 0 frames...
|
| 92 |
+
[2023-02-22 21:39:34,060][44365] Decorrelating experience for 32 frames...
|
| 93 |
+
[2023-02-22 21:39:34,061][44359] Decorrelating experience for 32 frames...
|
| 94 |
+
[2023-02-22 21:39:34,055][44374] Decorrelating experience for 32 frames...
|
| 95 |
+
[2023-02-22 21:39:34,351][44370] Decorrelating experience for 32 frames...
|
| 96 |
+
[2023-02-22 21:39:34,371][44372] Decorrelating experience for 32 frames...
|
| 97 |
+
[2023-02-22 21:39:34,444][44358] Decorrelating experience for 32 frames...
|
| 98 |
+
[2023-02-22 21:39:34,503][44362] Decorrelating experience for 32 frames...
|
| 99 |
+
[2023-02-22 21:39:35,178][44380] Decorrelating experience for 32 frames...
|
| 100 |
+
[2023-02-22 21:39:35,471][44358] Decorrelating experience for 64 frames...
|
| 101 |
+
[2023-02-22 21:39:35,815][44380] Decorrelating experience for 64 frames...
|
| 102 |
+
[2023-02-22 21:39:35,825][44365] Decorrelating experience for 64 frames...
|
| 103 |
+
[2023-02-22 21:39:35,879][44359] Decorrelating experience for 64 frames...
|
| 104 |
+
[2023-02-22 21:39:35,882][44374] Decorrelating experience for 64 frames...
|
| 105 |
+
[2023-02-22 21:39:36,094][44370] Decorrelating experience for 64 frames...
|
| 106 |
+
[2023-02-22 21:39:36,872][44380] Decorrelating experience for 96 frames...
|
| 107 |
+
[2023-02-22 21:39:36,885][44358] Decorrelating experience for 96 frames...
|
| 108 |
+
[2023-02-22 21:39:37,033][44372] Decorrelating experience for 64 frames...
|
| 109 |
+
[2023-02-22 21:39:37,168][44365] Decorrelating experience for 96 frames...
|
| 110 |
+
[2023-02-22 21:39:37,251][44359] Decorrelating experience for 96 frames...
|
| 111 |
+
[2023-02-22 21:39:37,288][44374] Decorrelating experience for 96 frames...
|
| 112 |
+
[2023-02-22 21:39:37,810][44362] Decorrelating experience for 64 frames...
|
| 113 |
+
[2023-02-22 21:39:38,106][44372] Decorrelating experience for 96 frames...
|
| 114 |
+
[2023-02-22 21:39:38,408][44370] Decorrelating experience for 96 frames...
|
| 115 |
+
[2023-02-22 21:39:38,514][44362] Decorrelating experience for 96 frames...
|
| 116 |
+
[2023-02-22 21:39:42,397][44343] Signal inference workers to stop experience collection...
|
| 117 |
+
[2023-02-22 21:39:42,420][44357] InferenceWorker_p0-w0: stopping experience collection
|
| 118 |
+
[2023-02-22 21:39:45,642][44343] Signal inference workers to resume experience collection...
|
| 119 |
+
[2023-02-22 21:39:45,646][44357] InferenceWorker_p0-w0: resuming experience collection
|
| 120 |
+
[2023-02-22 21:39:58,407][44357] Updated weights for policy 0, policy_version 10 (0.0583)
|
| 121 |
+
[2023-02-22 21:40:10,911][44357] Updated weights for policy 0, policy_version 20 (0.0022)
|
| 122 |
+
[2023-02-22 21:40:20,180][44343] Saving new best policy, reward=1.570!
|
| 123 |
+
[2023-02-22 21:40:22,503][44357] Updated weights for policy 0, policy_version 30 (0.0027)
|
| 124 |
+
[2023-02-22 21:40:25,197][44343] Saving new best policy, reward=1.799!
|
| 125 |
+
[2023-02-22 21:40:30,180][44343] Saving new best policy, reward=2.195!
|
| 126 |
+
[2023-02-22 21:40:35,188][44343] Saving new best policy, reward=2.481!
|
| 127 |
+
[2023-02-22 21:40:36,833][44357] Updated weights for policy 0, policy_version 40 (0.0012)
|
| 128 |
+
[2023-02-22 21:40:40,177][44343] Saving new best policy, reward=3.021!
|
| 129 |
+
[2023-02-22 21:40:45,185][44343] Saving new best policy, reward=3.234!
|
| 130 |
+
[2023-02-22 21:40:50,415][44357] Updated weights for policy 0, policy_version 50 (0.0028)
|
| 131 |
+
[2023-02-22 21:41:00,181][44343] Saving new best policy, reward=3.364!
|
| 132 |
+
[2023-02-22 21:41:01,966][44357] Updated weights for policy 0, policy_version 60 (0.0016)
|
| 133 |
+
[2023-02-22 21:41:05,186][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000062_253952.pth...
|
| 134 |
+
[2023-02-22 21:41:16,110][44357] Updated weights for policy 0, policy_version 70 (0.0020)
|
| 135 |
+
[2023-02-22 21:41:20,174][44343] Saving new best policy, reward=3.581!
|
| 136 |
+
[2023-02-22 21:41:28,833][44357] Updated weights for policy 0, policy_version 80 (0.0026)
|
| 137 |
+
[2023-02-22 21:41:30,184][44343] Saving new best policy, reward=3.665!
|
| 138 |
+
[2023-02-22 21:41:40,180][44343] Saving new best policy, reward=3.834!
|
| 139 |
+
[2023-02-22 21:41:40,808][44357] Updated weights for policy 0, policy_version 90 (0.0020)
|
| 140 |
+
[2023-02-22 21:41:45,270][44343] Saving new best policy, reward=4.019!
|
| 141 |
+
[2023-02-22 21:41:55,240][44357] Updated weights for policy 0, policy_version 100 (0.0018)
|
| 142 |
+
[2023-02-22 21:41:55,243][44343] Saving new best policy, reward=4.030!
|
| 143 |
+
[2023-02-22 21:42:00,177][44343] Saving new best policy, reward=4.640!
|
| 144 |
+
[2023-02-22 21:42:08,134][44357] Updated weights for policy 0, policy_version 110 (0.0022)
|
| 145 |
+
[2023-02-22 21:42:19,932][44357] Updated weights for policy 0, policy_version 120 (0.0012)
|
| 146 |
+
[2023-02-22 21:42:33,978][44357] Updated weights for policy 0, policy_version 130 (0.0024)
|
| 147 |
+
[2023-02-22 21:42:45,804][44357] Updated weights for policy 0, policy_version 140 (0.0020)
|
| 148 |
+
[2023-02-22 21:42:58,125][44357] Updated weights for policy 0, policy_version 150 (0.0019)
|
| 149 |
+
[2023-02-22 21:43:05,186][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000155_634880.pth...
|
| 150 |
+
[2023-02-22 21:43:12,422][44357] Updated weights for policy 0, policy_version 160 (0.0020)
|
| 151 |
+
[2023-02-22 21:43:24,033][44357] Updated weights for policy 0, policy_version 170 (0.0018)
|
| 152 |
+
[2023-02-22 21:43:36,766][44357] Updated weights for policy 0, policy_version 180 (0.0028)
|
| 153 |
+
[2023-02-22 21:43:50,179][44343] Saving new best policy, reward=4.947!
|
| 154 |
+
[2023-02-22 21:43:51,259][44357] Updated weights for policy 0, policy_version 190 (0.0016)
|
| 155 |
+
[2023-02-22 21:44:02,262][44357] Updated weights for policy 0, policy_version 200 (0.0014)
|
| 156 |
+
[2023-02-22 21:44:15,599][44357] Updated weights for policy 0, policy_version 210 (0.0013)
|
| 157 |
+
[2023-02-22 21:44:28,836][44357] Updated weights for policy 0, policy_version 220 (0.0013)
|
| 158 |
+
[2023-02-22 21:44:35,184][44343] Saving new best policy, reward=5.529!
|
| 159 |
+
[2023-02-22 21:44:39,706][44357] Updated weights for policy 0, policy_version 230 (0.0015)
|
| 160 |
+
[2023-02-22 21:44:53,759][44357] Updated weights for policy 0, policy_version 240 (0.0017)
|
| 161 |
+
[2023-02-22 21:45:05,188][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000249_1019904.pth...
|
| 162 |
+
[2023-02-22 21:45:05,432][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000062_253952.pth
|
| 163 |
+
[2023-02-22 21:45:06,592][44357] Updated weights for policy 0, policy_version 250 (0.0024)
|
| 164 |
+
[2023-02-22 21:45:18,363][44357] Updated weights for policy 0, policy_version 260 (0.0036)
|
| 165 |
+
[2023-02-22 21:45:32,402][44357] Updated weights for policy 0, policy_version 270 (0.0014)
|
| 166 |
+
[2023-02-22 21:45:44,524][44357] Updated weights for policy 0, policy_version 280 (0.0018)
|
| 167 |
+
[2023-02-22 21:45:55,183][44343] Saving new best policy, reward=5.693!
|
| 168 |
+
[2023-02-22 21:45:56,854][44357] Updated weights for policy 0, policy_version 290 (0.0024)
|
| 169 |
+
[2023-02-22 21:46:05,299][44343] Saving new best policy, reward=5.702!
|
| 170 |
+
[2023-02-22 21:46:10,895][44357] Updated weights for policy 0, policy_version 300 (0.0042)
|
| 171 |
+
[2023-02-22 21:46:22,027][44357] Updated weights for policy 0, policy_version 310 (0.0027)
|
| 172 |
+
[2023-02-22 21:46:30,183][44343] Saving new best policy, reward=6.326!
|
| 173 |
+
[2023-02-22 21:46:34,838][44357] Updated weights for policy 0, policy_version 320 (0.0028)
|
| 174 |
+
[2023-02-22 21:46:48,709][44357] Updated weights for policy 0, policy_version 330 (0.0034)
|
| 175 |
+
[2023-02-22 21:46:55,184][44343] Saving new best policy, reward=6.680!
|
| 176 |
+
[2023-02-22 21:46:59,173][44357] Updated weights for policy 0, policy_version 340 (0.0012)
|
| 177 |
+
[2023-02-22 21:47:05,198][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000343_1404928.pth...
|
| 178 |
+
[2023-02-22 21:47:05,367][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000155_634880.pth
|
| 179 |
+
[2023-02-22 21:47:13,125][44357] Updated weights for policy 0, policy_version 350 (0.0023)
|
| 180 |
+
[2023-02-22 21:47:26,190][44357] Updated weights for policy 0, policy_version 360 (0.0022)
|
| 181 |
+
[2023-02-22 21:47:37,325][44357] Updated weights for policy 0, policy_version 370 (0.0017)
|
| 182 |
+
[2023-02-22 21:47:51,298][44357] Updated weights for policy 0, policy_version 380 (0.0031)
|
| 183 |
+
[2023-02-22 21:48:04,032][44357] Updated weights for policy 0, policy_version 390 (0.0029)
|
| 184 |
+
[2023-02-22 21:48:10,200][44343] Saving new best policy, reward=6.724!
|
| 185 |
+
[2023-02-22 21:48:15,427][44357] Updated weights for policy 0, policy_version 400 (0.0029)
|
| 186 |
+
[2023-02-22 21:48:25,182][44343] Saving new best policy, reward=7.131!
|
| 187 |
+
[2023-02-22 21:48:29,120][44357] Updated weights for policy 0, policy_version 410 (0.0030)
|
| 188 |
+
[2023-02-22 21:48:40,241][44357] Updated weights for policy 0, policy_version 420 (0.0019)
|
| 189 |
+
[2023-02-22 21:48:53,312][44357] Updated weights for policy 0, policy_version 430 (0.0020)
|
| 190 |
+
[2023-02-22 21:49:05,192][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000438_1794048.pth...
|
| 191 |
+
[2023-02-22 21:49:05,454][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000249_1019904.pth
|
| 192 |
+
[2023-02-22 21:49:07,053][44357] Updated weights for policy 0, policy_version 440 (0.0017)
|
| 193 |
+
[2023-02-22 21:49:17,206][44357] Updated weights for policy 0, policy_version 450 (0.0017)
|
| 194 |
+
[2023-02-22 21:49:30,957][44357] Updated weights for policy 0, policy_version 460 (0.0036)
|
| 195 |
+
[2023-02-22 21:49:35,185][44343] Saving new best policy, reward=7.273!
|
| 196 |
+
[2023-02-22 21:49:44,197][44357] Updated weights for policy 0, policy_version 470 (0.0012)
|
| 197 |
+
[2023-02-22 21:49:55,073][44357] Updated weights for policy 0, policy_version 480 (0.0033)
|
| 198 |
+
[2023-02-22 21:50:09,114][44357] Updated weights for policy 0, policy_version 490 (0.0021)
|
| 199 |
+
[2023-02-22 21:50:10,175][44343] Saving new best policy, reward=7.677!
|
| 200 |
+
[2023-02-22 21:50:21,185][44357] Updated weights for policy 0, policy_version 500 (0.0015)
|
| 201 |
+
[2023-02-22 21:50:33,198][44357] Updated weights for policy 0, policy_version 510 (0.0018)
|
| 202 |
+
[2023-02-22 21:50:46,907][44357] Updated weights for policy 0, policy_version 520 (0.0022)
|
| 203 |
+
[2023-02-22 21:50:55,193][44343] Saving new best policy, reward=7.977!
|
| 204 |
+
[2023-02-22 21:50:58,148][44357] Updated weights for policy 0, policy_version 530 (0.0014)
|
| 205 |
+
[2023-02-22 21:51:05,188][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000534_2187264.pth...
|
| 206 |
+
[2023-02-22 21:51:05,403][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000343_1404928.pth
|
| 207 |
+
[2023-02-22 21:51:10,973][44357] Updated weights for policy 0, policy_version 540 (0.0022)
|
| 208 |
+
[2023-02-22 21:51:24,871][44357] Updated weights for policy 0, policy_version 550 (0.0011)
|
| 209 |
+
[2023-02-22 21:51:35,244][44357] Updated weights for policy 0, policy_version 560 (0.0019)
|
| 210 |
+
[2023-02-22 21:51:48,935][44357] Updated weights for policy 0, policy_version 570 (0.0018)
|
| 211 |
+
[2023-02-22 21:52:00,180][44343] Saving new best policy, reward=8.152!
|
| 212 |
+
[2023-02-22 21:52:01,787][44357] Updated weights for policy 0, policy_version 580 (0.0016)
|
| 213 |
+
[2023-02-22 21:52:13,228][44357] Updated weights for policy 0, policy_version 590 (0.0019)
|
| 214 |
+
[2023-02-22 21:52:20,181][44343] Saving new best policy, reward=8.358!
|
| 215 |
+
[2023-02-22 21:52:27,177][44357] Updated weights for policy 0, policy_version 600 (0.0012)
|
| 216 |
+
[2023-02-22 21:52:39,279][44357] Updated weights for policy 0, policy_version 610 (0.0013)
|
| 217 |
+
[2023-02-22 21:52:51,424][44357] Updated weights for policy 0, policy_version 620 (0.0036)
|
| 218 |
+
[2023-02-22 21:53:00,178][44343] Saving new best policy, reward=9.253!
|
| 219 |
+
[2023-02-22 21:53:05,166][44357] Updated weights for policy 0, policy_version 630 (0.0018)
|
| 220 |
+
[2023-02-22 21:53:05,183][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000630_2580480.pth...
|
| 221 |
+
[2023-02-22 21:53:05,353][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000438_1794048.pth
|
| 222 |
+
[2023-02-22 21:53:16,437][44357] Updated weights for policy 0, policy_version 640 (0.0013)
|
| 223 |
+
[2023-02-22 21:53:29,099][44357] Updated weights for policy 0, policy_version 650 (0.0014)
|
| 224 |
+
[2023-02-22 21:53:42,787][44357] Updated weights for policy 0, policy_version 660 (0.0023)
|
| 225 |
+
[2023-02-22 21:53:53,088][44357] Updated weights for policy 0, policy_version 670 (0.0012)
|
| 226 |
+
[2023-02-22 21:54:06,818][44357] Updated weights for policy 0, policy_version 680 (0.0038)
|
| 227 |
+
[2023-02-22 21:54:19,375][44357] Updated weights for policy 0, policy_version 690 (0.0026)
|
| 228 |
+
[2023-02-22 21:54:20,180][44343] Saving new best policy, reward=9.967!
|
| 229 |
+
[2023-02-22 21:54:30,617][44357] Updated weights for policy 0, policy_version 700 (0.0012)
|
| 230 |
+
[2023-02-22 21:54:44,488][44357] Updated weights for policy 0, policy_version 710 (0.0034)
|
| 231 |
+
[2023-02-22 21:54:56,158][44357] Updated weights for policy 0, policy_version 720 (0.0031)
|
| 232 |
+
[2023-02-22 21:55:05,186][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000726_2973696.pth...
|
| 233 |
+
[2023-02-22 21:55:05,392][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000534_2187264.pth
|
| 234 |
+
[2023-02-22 21:55:08,422][44357] Updated weights for policy 0, policy_version 730 (0.0025)
|
| 235 |
+
[2023-02-22 21:55:22,098][44357] Updated weights for policy 0, policy_version 740 (0.0025)
|
| 236 |
+
[2023-02-22 21:55:32,470][44357] Updated weights for policy 0, policy_version 750 (0.0014)
|
| 237 |
+
[2023-02-22 21:55:46,132][44357] Updated weights for policy 0, policy_version 760 (0.0019)
|
| 238 |
+
[2023-02-22 21:55:59,242][44357] Updated weights for policy 0, policy_version 770 (0.0018)
|
| 239 |
+
[2023-02-22 21:56:09,892][44357] Updated weights for policy 0, policy_version 780 (0.0019)
|
| 240 |
+
[2023-02-22 21:56:23,536][44357] Updated weights for policy 0, policy_version 790 (0.0036)
|
| 241 |
+
[2023-02-22 21:56:35,273][44357] Updated weights for policy 0, policy_version 800 (0.0020)
|
| 242 |
+
[2023-02-22 21:56:47,544][44357] Updated weights for policy 0, policy_version 810 (0.0026)
|
| 243 |
+
[2023-02-22 21:57:01,255][44357] Updated weights for policy 0, policy_version 820 (0.0018)
|
| 244 |
+
[2023-02-22 21:57:05,180][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000823_3371008.pth...
|
| 245 |
+
[2023-02-22 21:57:05,331][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000630_2580480.pth
|
| 246 |
+
[2023-02-22 21:57:12,428][44357] Updated weights for policy 0, policy_version 830 (0.0022)
|
| 247 |
+
[2023-02-22 21:57:25,276][44357] Updated weights for policy 0, policy_version 840 (0.0030)
|
| 248 |
+
[2023-02-22 21:57:38,985][44357] Updated weights for policy 0, policy_version 850 (0.0037)
|
| 249 |
+
[2023-02-22 21:57:49,296][44357] Updated weights for policy 0, policy_version 860 (0.0013)
|
| 250 |
+
[2023-02-22 21:58:03,074][44357] Updated weights for policy 0, policy_version 870 (0.0027)
|
| 251 |
+
[2023-02-22 21:58:15,882][44357] Updated weights for policy 0, policy_version 880 (0.0013)
|
| 252 |
+
[2023-02-22 21:58:26,821][44357] Updated weights for policy 0, policy_version 890 (0.0020)
|
| 253 |
+
[2023-02-22 21:58:40,175][44343] Saving new best policy, reward=10.147!
|
| 254 |
+
[2023-02-22 21:58:40,744][44357] Updated weights for policy 0, policy_version 900 (0.0021)
|
| 255 |
+
[2023-02-22 21:58:45,189][44343] Saving new best policy, reward=10.394!
|
| 256 |
+
[2023-02-22 21:58:52,878][44357] Updated weights for policy 0, policy_version 910 (0.0020)
|
| 257 |
+
[2023-02-22 21:59:04,728][44357] Updated weights for policy 0, policy_version 920 (0.0018)
|
| 258 |
+
[2023-02-22 21:59:05,186][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000920_3768320.pth...
|
| 259 |
+
[2023-02-22 21:59:05,369][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000726_2973696.pth
|
| 260 |
+
[2023-02-22 21:59:18,648][44357] Updated weights for policy 0, policy_version 930 (0.0012)
|
| 261 |
+
[2023-02-22 21:59:29,534][44357] Updated weights for policy 0, policy_version 940 (0.0018)
|
| 262 |
+
[2023-02-22 21:59:42,369][44357] Updated weights for policy 0, policy_version 950 (0.0038)
|
| 263 |
+
[2023-02-22 21:59:50,177][44343] Saving new best policy, reward=10.410!
|
| 264 |
+
[2023-02-22 21:59:56,324][44357] Updated weights for policy 0, policy_version 960 (0.0018)
|
| 265 |
+
[2023-02-22 22:00:00,175][44343] Saving new best policy, reward=10.565!
|
| 266 |
+
[2023-02-22 22:00:06,253][44357] Updated weights for policy 0, policy_version 970 (0.0017)
|
| 267 |
+
[2023-02-22 22:00:20,191][44357] Updated weights for policy 0, policy_version 980 (0.0040)
|
| 268 |
+
[2023-02-22 22:00:32,711][44357] Updated weights for policy 0, policy_version 990 (0.0014)
|
| 269 |
+
[2023-02-22 22:00:43,741][44357] Updated weights for policy 0, policy_version 1000 (0.0022)
|
| 270 |
+
[2023-02-22 22:00:57,526][44357] Updated weights for policy 0, policy_version 1010 (0.0012)
|
| 271 |
+
[2023-02-22 22:01:05,180][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001017_4165632.pth...
|
| 272 |
+
[2023-02-22 22:01:05,335][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000823_3371008.pth
|
| 273 |
+
[2023-02-22 22:01:08,715][44357] Updated weights for policy 0, policy_version 1020 (0.0035)
|
| 274 |
+
[2023-02-22 22:01:21,311][44357] Updated weights for policy 0, policy_version 1030 (0.0016)
|
| 275 |
+
[2023-02-22 22:01:25,190][44343] Saving new best policy, reward=10.667!
|
| 276 |
+
[2023-02-22 22:01:30,195][44343] Saving new best policy, reward=11.531!
|
| 277 |
+
[2023-02-22 22:01:34,973][44357] Updated weights for policy 0, policy_version 1040 (0.0034)
|
| 278 |
+
[2023-02-22 22:01:45,187][44343] Saving new best policy, reward=11.983!
|
| 279 |
+
[2023-02-22 22:01:45,549][44357] Updated weights for policy 0, policy_version 1050 (0.0012)
|
| 280 |
+
[2023-02-22 22:01:59,401][44357] Updated weights for policy 0, policy_version 1060 (0.0036)
|
| 281 |
+
[2023-02-22 22:02:12,148][44357] Updated weights for policy 0, policy_version 1070 (0.0015)
|
| 282 |
+
[2023-02-22 22:02:23,117][44357] Updated weights for policy 0, policy_version 1080 (0.0012)
|
| 283 |
+
[2023-02-22 22:02:36,996][44357] Updated weights for policy 0, policy_version 1090 (0.0038)
|
| 284 |
+
[2023-02-22 22:02:48,187][44357] Updated weights for policy 0, policy_version 1100 (0.0023)
|
| 285 |
+
[2023-02-22 22:03:00,451][44357] Updated weights for policy 0, policy_version 1110 (0.0032)
|
| 286 |
+
[2023-02-22 22:03:05,188][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001114_4562944.pth...
|
| 287 |
+
[2023-02-22 22:03:05,334][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000000920_3768320.pth
|
| 288 |
+
[2023-02-22 22:03:14,138][44357] Updated weights for policy 0, policy_version 1120 (0.0019)
|
| 289 |
+
[2023-02-22 22:03:24,170][44357] Updated weights for policy 0, policy_version 1130 (0.0013)
|
| 290 |
+
[2023-02-22 22:03:37,655][44357] Updated weights for policy 0, policy_version 1140 (0.0018)
|
| 291 |
+
[2023-02-22 22:03:50,128][44357] Updated weights for policy 0, policy_version 1150 (0.0012)
|
| 292 |
+
[2023-02-22 22:04:01,346][44357] Updated weights for policy 0, policy_version 1160 (0.0023)
|
| 293 |
+
[2023-02-22 22:04:15,080][44357] Updated weights for policy 0, policy_version 1170 (0.0013)
|
| 294 |
+
[2023-02-22 22:04:26,034][44357] Updated weights for policy 0, policy_version 1180 (0.0028)
|
| 295 |
+
[2023-02-22 22:04:38,850][44357] Updated weights for policy 0, policy_version 1190 (0.0013)
|
| 296 |
+
[2023-02-22 22:04:52,728][44357] Updated weights for policy 0, policy_version 1200 (0.0013)
|
| 297 |
+
[2023-02-22 22:05:02,699][44357] Updated weights for policy 0, policy_version 1210 (0.0028)
|
| 298 |
+
[2023-02-22 22:05:05,188][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001211_4960256.pth...
|
| 299 |
+
[2023-02-22 22:05:05,398][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001017_4165632.pth
|
| 300 |
+
[2023-02-22 22:05:16,469][44357] Updated weights for policy 0, policy_version 1220 (0.0012)
|
| 301 |
+
[2023-02-22 22:05:20,173][44343] Saving new best policy, reward=13.082!
|
| 302 |
+
[2023-02-22 22:05:28,886][44357] Updated weights for policy 0, policy_version 1230 (0.0016)
|
| 303 |
+
[2023-02-22 22:05:39,747][44357] Updated weights for policy 0, policy_version 1240 (0.0026)
|
| 304 |
+
[2023-02-22 22:05:53,590][44357] Updated weights for policy 0, policy_version 1250 (0.0016)
|
| 305 |
+
[2023-02-22 22:06:04,499][44357] Updated weights for policy 0, policy_version 1260 (0.0012)
|
| 306 |
+
[2023-02-22 22:06:10,181][44343] Saving new best policy, reward=13.267!
|
| 307 |
+
[2023-02-22 22:06:17,223][44357] Updated weights for policy 0, policy_version 1270 (0.0021)
|
| 308 |
+
[2023-02-22 22:06:30,868][44357] Updated weights for policy 0, policy_version 1280 (0.0027)
|
| 309 |
+
[2023-02-22 22:06:40,821][44357] Updated weights for policy 0, policy_version 1290 (0.0014)
|
| 310 |
+
[2023-02-22 22:06:54,580][44357] Updated weights for policy 0, policy_version 1300 (0.0029)
|
| 311 |
+
[2023-02-22 22:07:05,302][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001309_5361664.pth...
|
| 312 |
+
[2023-02-22 22:07:05,580][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001114_4562944.pth
|
| 313 |
+
[2023-02-22 22:07:07,009][44357] Updated weights for policy 0, policy_version 1310 (0.0023)
|
| 314 |
+
[2023-02-22 22:07:18,242][44357] Updated weights for policy 0, policy_version 1320 (0.0014)
|
| 315 |
+
[2023-02-22 22:07:31,652][44357] Updated weights for policy 0, policy_version 1330 (0.0013)
|
| 316 |
+
[2023-02-22 22:07:40,183][44343] Saving new best policy, reward=13.692!
|
| 317 |
+
[2023-02-22 22:07:43,091][44357] Updated weights for policy 0, policy_version 1340 (0.0015)
|
| 318 |
+
[2023-02-22 22:07:55,898][44357] Updated weights for policy 0, policy_version 1350 (0.0031)
|
| 319 |
+
[2023-02-22 22:08:09,784][44357] Updated weights for policy 0, policy_version 1360 (0.0027)
|
| 320 |
+
[2023-02-22 22:08:19,962][44357] Updated weights for policy 0, policy_version 1370 (0.0019)
|
| 321 |
+
[2023-02-22 22:08:33,221][44357] Updated weights for policy 0, policy_version 1380 (0.0028)
|
| 322 |
+
[2023-02-22 22:08:45,882][44357] Updated weights for policy 0, policy_version 1390 (0.0023)
|
| 323 |
+
[2023-02-22 22:08:56,933][44357] Updated weights for policy 0, policy_version 1400 (0.0014)
|
| 324 |
+
[2023-02-22 22:09:05,188][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001405_5754880.pth...
|
| 325 |
+
[2023-02-22 22:09:05,382][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001211_4960256.pth
|
| 326 |
+
[2023-02-22 22:09:10,651][44357] Updated weights for policy 0, policy_version 1410 (0.0014)
|
| 327 |
+
[2023-02-22 22:09:22,062][44357] Updated weights for policy 0, policy_version 1420 (0.0012)
|
| 328 |
+
[2023-02-22 22:09:34,396][44357] Updated weights for policy 0, policy_version 1430 (0.0024)
|
| 329 |
+
[2023-02-22 22:09:40,177][44343] Saving new best policy, reward=13.814!
|
| 330 |
+
[2023-02-22 22:09:48,019][44357] Updated weights for policy 0, policy_version 1440 (0.0012)
|
| 331 |
+
[2023-02-22 22:09:50,187][44343] Saving new best policy, reward=14.316!
|
| 332 |
+
[2023-02-22 22:09:58,219][44357] Updated weights for policy 0, policy_version 1450 (0.0024)
|
| 333 |
+
[2023-02-22 22:10:11,801][44357] Updated weights for policy 0, policy_version 1460 (0.0020)
|
| 334 |
+
[2023-02-22 22:10:24,446][44357] Updated weights for policy 0, policy_version 1470 (0.0031)
|
| 335 |
+
[2023-02-22 22:10:35,200][44343] Saving new best policy, reward=14.817!
|
| 336 |
+
[2023-02-22 22:10:35,202][44357] Updated weights for policy 0, policy_version 1480 (0.0017)
|
| 337 |
+
[2023-02-22 22:10:48,866][44357] Updated weights for policy 0, policy_version 1490 (0.0020)
|
| 338 |
+
[2023-02-22 22:11:00,119][44357] Updated weights for policy 0, policy_version 1500 (0.0017)
|
| 339 |
+
[2023-02-22 22:11:05,261][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001503_6156288.pth...
|
| 340 |
+
[2023-02-22 22:11:05,509][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001309_5361664.pth
|
| 341 |
+
[2023-02-22 22:11:10,178][44343] Saving new best policy, reward=14.984!
|
| 342 |
+
[2023-02-22 22:11:12,591][44357] Updated weights for policy 0, policy_version 1510 (0.0019)
|
| 343 |
+
[2023-02-22 22:11:26,261][44357] Updated weights for policy 0, policy_version 1520 (0.0027)
|
| 344 |
+
[2023-02-22 22:11:36,250][44357] Updated weights for policy 0, policy_version 1530 (0.0019)
|
| 345 |
+
[2023-02-22 22:11:49,925][44357] Updated weights for policy 0, policy_version 1540 (0.0016)
|
| 346 |
+
[2023-02-22 22:12:02,863][44357] Updated weights for policy 0, policy_version 1550 (0.0015)
|
| 347 |
+
[2023-02-22 22:12:13,806][44357] Updated weights for policy 0, policy_version 1560 (0.0012)
|
| 348 |
+
[2023-02-22 22:12:28,531][44357] Updated weights for policy 0, policy_version 1570 (0.0021)
|
| 349 |
+
[2023-02-22 22:12:42,652][44357] Updated weights for policy 0, policy_version 1580 (0.0024)
|
| 350 |
+
[2023-02-22 22:12:53,704][44357] Updated weights for policy 0, policy_version 1590 (0.0021)
|
| 351 |
+
[2023-02-22 22:13:05,187][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001597_6541312.pth...
|
| 352 |
+
[2023-02-22 22:13:05,492][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001405_5754880.pth
|
| 353 |
+
[2023-02-22 22:13:08,100][44357] Updated weights for policy 0, policy_version 1600 (0.0018)
|
| 354 |
+
[2023-02-22 22:13:20,543][44357] Updated weights for policy 0, policy_version 1610 (0.0017)
|
| 355 |
+
[2023-02-22 22:13:31,704][44357] Updated weights for policy 0, policy_version 1620 (0.0025)
|
| 356 |
+
[2023-02-22 22:13:45,190][44343] Saving new best policy, reward=15.358!
|
| 357 |
+
[2023-02-22 22:13:45,524][44357] Updated weights for policy 0, policy_version 1630 (0.0019)
|
| 358 |
+
[2023-02-22 22:13:56,449][44357] Updated weights for policy 0, policy_version 1640 (0.0013)
|
| 359 |
+
[2023-02-22 22:14:09,261][44357] Updated weights for policy 0, policy_version 1650 (0.0014)
|
| 360 |
+
[2023-02-22 22:14:22,595][44357] Updated weights for policy 0, policy_version 1660 (0.0027)
|
| 361 |
+
[2023-02-22 22:14:32,964][44357] Updated weights for policy 0, policy_version 1670 (0.0018)
|
| 362 |
+
[2023-02-22 22:14:40,195][44343] Saving new best policy, reward=15.815!
|
| 363 |
+
[2023-02-22 22:14:46,637][44357] Updated weights for policy 0, policy_version 1680 (0.0028)
|
| 364 |
+
[2023-02-22 22:14:58,429][44357] Updated weights for policy 0, policy_version 1690 (0.0027)
|
| 365 |
+
[2023-02-22 22:15:05,195][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001695_6942720.pth...
|
| 366 |
+
[2023-02-22 22:15:05,373][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001503_6156288.pth
|
| 367 |
+
[2023-02-22 22:15:10,380][44357] Updated weights for policy 0, policy_version 1700 (0.0017)
|
| 368 |
+
[2023-02-22 22:15:24,328][44357] Updated weights for policy 0, policy_version 1710 (0.0028)
|
| 369 |
+
[2023-02-22 22:15:34,965][44357] Updated weights for policy 0, policy_version 1720 (0.0031)
|
| 370 |
+
[2023-02-22 22:15:47,961][44357] Updated weights for policy 0, policy_version 1730 (0.0021)
|
| 371 |
+
[2023-02-22 22:16:01,469][44357] Updated weights for policy 0, policy_version 1740 (0.0023)
|
| 372 |
+
[2023-02-22 22:16:12,088][44357] Updated weights for policy 0, policy_version 1750 (0.0018)
|
| 373 |
+
[2023-02-22 22:16:25,789][44357] Updated weights for policy 0, policy_version 1760 (0.0027)
|
| 374 |
+
[2023-02-22 22:16:37,997][44357] Updated weights for policy 0, policy_version 1770 (0.0026)
|
| 375 |
+
[2023-02-22 22:16:49,718][44357] Updated weights for policy 0, policy_version 1780 (0.0020)
|
| 376 |
+
[2023-02-22 22:17:03,608][44357] Updated weights for policy 0, policy_version 1790 (0.0023)
|
| 377 |
+
[2023-02-22 22:17:05,187][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001791_7335936.pth...
|
| 378 |
+
[2023-02-22 22:17:05,337][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001597_6541312.pth
|
| 379 |
+
[2023-02-22 22:17:15,426][44357] Updated weights for policy 0, policy_version 1800 (0.0028)
|
| 380 |
+
[2023-02-22 22:17:28,352][44357] Updated weights for policy 0, policy_version 1810 (0.0015)
|
| 381 |
+
[2023-02-22 22:17:42,564][44357] Updated weights for policy 0, policy_version 1820 (0.0012)
|
| 382 |
+
[2023-02-22 22:17:53,088][44357] Updated weights for policy 0, policy_version 1830 (0.0018)
|
| 383 |
+
[2023-02-22 22:18:06,524][44357] Updated weights for policy 0, policy_version 1840 (0.0026)
|
| 384 |
+
[2023-02-22 22:18:19,831][44357] Updated weights for policy 0, policy_version 1850 (0.0026)
|
| 385 |
+
[2023-02-22 22:18:30,737][44357] Updated weights for policy 0, policy_version 1860 (0.0014)
|
| 386 |
+
[2023-02-22 22:18:44,377][44357] Updated weights for policy 0, policy_version 1870 (0.0040)
|
| 387 |
+
[2023-02-22 22:18:56,344][44357] Updated weights for policy 0, policy_version 1880 (0.0019)
|
| 388 |
+
[2023-02-22 22:19:05,184][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001887_7729152.pth...
|
| 389 |
+
[2023-02-22 22:19:05,374][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001695_6942720.pth
|
| 390 |
+
[2023-02-22 22:19:08,080][44357] Updated weights for policy 0, policy_version 1890 (0.0022)
|
| 391 |
+
[2023-02-22 22:19:21,773][44357] Updated weights for policy 0, policy_version 1900 (0.0021)
|
| 392 |
+
[2023-02-22 22:19:32,368][44357] Updated weights for policy 0, policy_version 1910 (0.0028)
|
| 393 |
+
[2023-02-22 22:19:45,622][44357] Updated weights for policy 0, policy_version 1920 (0.0014)
|
| 394 |
+
[2023-02-22 22:19:58,770][44357] Updated weights for policy 0, policy_version 1930 (0.0021)
|
| 395 |
+
[2023-02-22 22:20:09,516][44357] Updated weights for policy 0, policy_version 1940 (0.0013)
|
| 396 |
+
[2023-02-22 22:20:23,161][44357] Updated weights for policy 0, policy_version 1950 (0.0014)
|
| 397 |
+
[2023-02-22 22:20:35,289][44357] Updated weights for policy 0, policy_version 1960 (0.0012)
|
| 398 |
+
[2023-02-22 22:20:46,932][44357] Updated weights for policy 0, policy_version 1970 (0.0019)
|
| 399 |
+
[2023-02-22 22:21:00,847][44357] Updated weights for policy 0, policy_version 1980 (0.0020)
|
| 400 |
+
[2023-02-22 22:21:05,191][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001984_8126464.pth...
|
| 401 |
+
[2023-02-22 22:21:05,357][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001791_7335936.pth
|
| 402 |
+
[2023-02-22 22:21:12,097][44357] Updated weights for policy 0, policy_version 1990 (0.0012)
|
| 403 |
+
[2023-02-22 22:21:24,938][44357] Updated weights for policy 0, policy_version 2000 (0.0017)
|
| 404 |
+
[2023-02-22 22:21:38,170][44357] Updated weights for policy 0, policy_version 2010 (0.0018)
|
| 405 |
+
[2023-02-22 22:21:48,538][44357] Updated weights for policy 0, policy_version 2020 (0.0024)
|
| 406 |
+
[2023-02-22 22:21:50,174][44343] Saving new best policy, reward=15.986!
|
| 407 |
+
[2023-02-22 22:21:55,187][44343] Saving new best policy, reward=16.024!
|
| 408 |
+
[2023-02-22 22:22:02,271][44357] Updated weights for policy 0, policy_version 2030 (0.0027)
|
| 409 |
+
[2023-02-22 22:22:14,337][44357] Updated weights for policy 0, policy_version 2040 (0.0024)
|
| 410 |
+
[2023-02-22 22:22:26,220][44357] Updated weights for policy 0, policy_version 2050 (0.0022)
|
| 411 |
+
[2023-02-22 22:22:40,009][44357] Updated weights for policy 0, policy_version 2060 (0.0031)
|
| 412 |
+
[2023-02-22 22:22:50,319][44357] Updated weights for policy 0, policy_version 2070 (0.0014)
|
| 413 |
+
[2023-02-22 22:22:55,196][44343] Saving new best policy, reward=16.279!
|
| 414 |
+
[2023-02-22 22:23:03,597][44357] Updated weights for policy 0, policy_version 2080 (0.0018)
|
| 415 |
+
[2023-02-22 22:23:05,182][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002081_8523776.pth...
|
| 416 |
+
[2023-02-22 22:23:05,358][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001887_7729152.pth
|
| 417 |
+
[2023-02-22 22:23:05,365][44343] Saving new best policy, reward=16.580!
|
| 418 |
+
[2023-02-22 22:23:10,183][44343] Saving new best policy, reward=17.174!
|
| 419 |
+
[2023-02-22 22:23:17,504][44357] Updated weights for policy 0, policy_version 2090 (0.0042)
|
| 420 |
+
[2023-02-22 22:23:28,031][44357] Updated weights for policy 0, policy_version 2100 (0.0016)
|
| 421 |
+
[2023-02-22 22:23:41,630][44357] Updated weights for policy 0, policy_version 2110 (0.0023)
|
| 422 |
+
[2023-02-22 22:23:54,289][44357] Updated weights for policy 0, policy_version 2120 (0.0018)
|
| 423 |
+
[2023-02-22 22:24:05,664][44357] Updated weights for policy 0, policy_version 2130 (0.0017)
|
| 424 |
+
[2023-02-22 22:24:19,337][44357] Updated weights for policy 0, policy_version 2140 (0.0013)
|
| 425 |
+
[2023-02-22 22:24:30,018][44357] Updated weights for policy 0, policy_version 2150 (0.0026)
|
| 426 |
+
[2023-02-22 22:24:43,012][44357] Updated weights for policy 0, policy_version 2160 (0.0028)
|
| 427 |
+
[2023-02-22 22:24:56,533][44357] Updated weights for policy 0, policy_version 2170 (0.0014)
|
| 428 |
+
[2023-02-22 22:25:05,192][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002178_8921088.pth...
|
| 429 |
+
[2023-02-22 22:25:05,348][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000001984_8126464.pth
|
| 430 |
+
[2023-02-22 22:25:06,980][44357] Updated weights for policy 0, policy_version 2180 (0.0020)
|
| 431 |
+
[2023-02-22 22:25:20,932][44357] Updated weights for policy 0, policy_version 2190 (0.0038)
|
| 432 |
+
[2023-02-22 22:25:33,457][44357] Updated weights for policy 0, policy_version 2200 (0.0017)
|
| 433 |
+
[2023-02-22 22:25:44,759][44357] Updated weights for policy 0, policy_version 2210 (0.0029)
|
| 434 |
+
[2023-02-22 22:25:58,677][44357] Updated weights for policy 0, policy_version 2220 (0.0012)
|
| 435 |
+
[2023-02-22 22:26:09,960][44357] Updated weights for policy 0, policy_version 2230 (0.0019)
|
| 436 |
+
[2023-02-22 22:26:22,317][44357] Updated weights for policy 0, policy_version 2240 (0.0022)
|
| 437 |
+
[2023-02-22 22:26:36,070][44357] Updated weights for policy 0, policy_version 2250 (0.0022)
|
| 438 |
+
[2023-02-22 22:26:45,762][44357] Updated weights for policy 0, policy_version 2260 (0.0020)
|
| 439 |
+
[2023-02-22 22:26:59,512][44357] Updated weights for policy 0, policy_version 2270 (0.0018)
|
| 440 |
+
[2023-02-22 22:27:05,190][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002275_9318400.pth...
|
| 441 |
+
[2023-02-22 22:27:05,351][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002081_8523776.pth
|
| 442 |
+
[2023-02-22 22:27:05,375][44343] Saving new best policy, reward=17.231!
|
| 443 |
+
[2023-02-22 22:27:12,097][44357] Updated weights for policy 0, policy_version 2280 (0.0022)
|
| 444 |
+
[2023-02-22 22:27:23,133][44357] Updated weights for policy 0, policy_version 2290 (0.0024)
|
| 445 |
+
[2023-02-22 22:27:36,807][44357] Updated weights for policy 0, policy_version 2300 (0.0027)
|
| 446 |
+
[2023-02-22 22:27:48,016][44357] Updated weights for policy 0, policy_version 2310 (0.0012)
|
| 447 |
+
[2023-02-22 22:28:00,458][44357] Updated weights for policy 0, policy_version 2320 (0.0020)
|
| 448 |
+
[2023-02-22 22:28:14,185][44357] Updated weights for policy 0, policy_version 2330 (0.0023)
|
| 449 |
+
[2023-02-22 22:28:24,436][44357] Updated weights for policy 0, policy_version 2340 (0.0020)
|
| 450 |
+
[2023-02-22 22:28:38,101][44357] Updated weights for policy 0, policy_version 2350 (0.0029)
|
| 451 |
+
[2023-02-22 22:28:50,500][44357] Updated weights for policy 0, policy_version 2360 (0.0011)
|
| 452 |
+
[2023-02-22 22:29:01,834][44357] Updated weights for policy 0, policy_version 2370 (0.0032)
|
| 453 |
+
[2023-02-22 22:29:05,187][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002372_9715712.pth...
|
| 454 |
+
[2023-02-22 22:29:05,405][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002178_8921088.pth
|
| 455 |
+
[2023-02-22 22:29:15,519][44357] Updated weights for policy 0, policy_version 2380 (0.0022)
|
| 456 |
+
[2023-02-22 22:29:26,480][44357] Updated weights for policy 0, policy_version 2390 (0.0013)
|
| 457 |
+
[2023-02-22 22:29:39,172][44357] Updated weights for policy 0, policy_version 2400 (0.0019)
|
| 458 |
+
[2023-02-22 22:29:52,732][44357] Updated weights for policy 0, policy_version 2410 (0.0016)
|
| 459 |
+
[2023-02-22 22:30:02,719][44357] Updated weights for policy 0, policy_version 2420 (0.0013)
|
| 460 |
+
[2023-02-22 22:30:16,279][44357] Updated weights for policy 0, policy_version 2430 (0.0014)
|
| 461 |
+
[2023-02-22 22:30:28,389][44357] Updated weights for policy 0, policy_version 2440 (0.0012)
|
| 462 |
+
[2023-02-22 22:30:33,118][44343] Stopping Batcher_0...
|
| 463 |
+
[2023-02-22 22:30:33,120][44343] Loop batcher_evt_loop terminating...
|
| 464 |
+
[2023-02-22 22:30:33,121][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002443_10006528.pth...
|
| 465 |
+
[2023-02-22 22:30:33,206][44357] Weights refcount: 2 0
|
| 466 |
+
[2023-02-22 22:30:33,224][44357] Stopping InferenceWorker_p0-w0...
|
| 467 |
+
[2023-02-22 22:30:33,225][44357] Loop inference_proc0-0_evt_loop terminating...
|
| 468 |
+
[2023-02-22 22:30:33,256][44343] Removing /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002275_9318400.pth
|
| 469 |
+
[2023-02-22 22:30:33,264][44343] Saving /content/train_dir/doom_deadly_corridor/checkpoint_p0/checkpoint_000002443_10006528.pth...
|
| 470 |
+
[2023-02-22 22:30:33,385][44343] Stopping LearnerWorker_p0...
|
| 471 |
+
[2023-02-22 22:30:33,389][44343] Loop learner_proc0_evt_loop terminating...
|
| 472 |
+
[2023-02-22 22:30:33,492][44359] Stopping RolloutWorker_w1...
|
| 473 |
+
[2023-02-22 22:30:33,502][44359] Loop rollout_proc1_evt_loop terminating...
|
| 474 |
+
[2023-02-22 22:30:33,513][44365] Stopping RolloutWorker_w3...
|
| 475 |
+
[2023-02-22 22:30:33,522][44380] Stopping RolloutWorker_w6...
|
| 476 |
+
[2023-02-22 22:30:33,520][44365] Loop rollout_proc3_evt_loop terminating...
|
| 477 |
+
[2023-02-22 22:30:33,523][44374] Stopping RolloutWorker_w7...
|
| 478 |
+
[2023-02-22 22:30:33,528][44370] Stopping RolloutWorker_w5...
|
| 479 |
+
[2023-02-22 22:30:33,529][44370] Loop rollout_proc5_evt_loop terminating...
|
| 480 |
+
[2023-02-22 22:30:33,535][44374] Loop rollout_proc7_evt_loop terminating...
|
| 481 |
+
[2023-02-22 22:30:33,539][44380] Loop rollout_proc6_evt_loop terminating...
|
| 482 |
+
[2023-02-22 22:30:33,561][44372] Stopping RolloutWorker_w4...
|
| 483 |
+
[2023-02-22 22:30:33,575][44362] Stopping RolloutWorker_w2...
|
| 484 |
+
[2023-02-22 22:30:33,575][44362] Loop rollout_proc2_evt_loop terminating...
|
| 485 |
+
[2023-02-22 22:30:33,590][44358] Stopping RolloutWorker_w0...
|
| 486 |
+
[2023-02-22 22:30:33,590][44358] Loop rollout_proc0_evt_loop terminating...
|
| 487 |
+
[2023-02-22 22:30:33,568][44372] Loop rollout_proc4_evt_loop terminating...
|