Commit
·
42fd5a6
1
Parent(s):
4bdc863
Upload folder using huggingface_hub
Browse files- .summary/0/events.out.tfevents.1697542852.rhmmedcatt-proliant-ml350-gen10 +3 -0
- .summary/0/events.out.tfevents.1697545572.rhmmedcatt-proliant-ml350-gen10 +3 -0
- .summary/1/events.out.tfevents.1697542852.rhmmedcatt-proliant-ml350-gen10 +3 -0
- .summary/1/events.out.tfevents.1697545572.rhmmedcatt-proliant-ml350-gen10 +3 -0
- README.md +35 -16
- checkpoint_p0/best_000003328_851968_reward_6.150.pth +3 -0
- checkpoint_p0/checkpoint_000002976_761856.pth +3 -0
- checkpoint_p0/checkpoint_000003584_917504.pth +3 -0
- checkpoint_p1/best_000001696_434176_reward_6.290.pth +3 -0
- checkpoint_p1/checkpoint_000003040_778240.pth +3 -0
- checkpoint_p1/checkpoint_000003648_933888.pth +3 -0
- config.json +11 -9
- git.diff +2 -2
- replay.mp4 +0 -0
- sf_log.txt +0 -0
.summary/0/events.out.tfevents.1697542852.rhmmedcatt-proliant-ml350-gen10
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7ee1d5e49d893ff354bef06bf11e0ca35dbc184f6461d7355b0064e5f31d3d79
|
| 3 |
+
size 28212
|
.summary/0/events.out.tfevents.1697545572.rhmmedcatt-proliant-ml350-gen10
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:293d37db085e936373971881782e5d5df4e7de9b15b5453e9cf2c0e990bde9a0
|
| 3 |
+
size 209301
|
.summary/1/events.out.tfevents.1697542852.rhmmedcatt-proliant-ml350-gen10
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0f95ee9fddd1c6ad6c44eab8c668c6e06936d0cd1de17fc07a53cab4a724ddd3
|
| 3 |
+
size 19665
|
.summary/1/events.out.tfevents.1697545572.rhmmedcatt-proliant-ml350-gen10
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2f9cacb5ded5dc10b3cd59ef32cfb55dbe73b644a3fd6ad171b3c71df1b153f1
|
| 3 |
+
size 151382
|
README.md
CHANGED
|
@@ -15,35 +15,38 @@ model-index:
|
|
| 15 |
type: atari_asteroid
|
| 16 |
metrics:
|
| 17 |
- type: mean_reward
|
| 18 |
-
value:
|
| 19 |
name: mean_reward
|
| 20 |
verified: false
|
| 21 |
---
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
This
|
| 26 |
-
Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/
|
| 27 |
|
| 28 |
|
| 29 |
-
##
|
| 30 |
|
| 31 |
-
|
| 32 |
-
```
|
| 33 |
-
python -m sample_factory.huggingface.load_from_hub -r MattStammers/APPO-atari_asteroid
|
| 34 |
-
```
|
| 35 |
|
| 36 |
-
|
| 37 |
-
|
|
|
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
-
The aim is to reach state-of-the-art (SOTA) performance on each atari environment. I will flag the models with SOTA when they reach at or near these levels.
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
|
|
|
| 45 |
```
|
| 46 |
hyperparameters = {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
"device": "gpu",
|
| 48 |
"seed": 1234,
|
| 49 |
"num_policies": 2,
|
|
@@ -141,12 +144,28 @@ hyperparameters = {
|
|
| 141 |
"env_gpu_observations": true,
|
| 142 |
"env_frameskip": 4,
|
| 143 |
"env_framestack": 4,
|
| 144 |
-
|
|
|
|
| 145 |
|
| 146 |
```
|
| 147 |
|
| 148 |
|
| 149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
## Using the model
|
| 151 |
|
| 152 |
To run the model after download, use the `enjoy` script corresponding to this environment:
|
|
|
|
| 15 |
type: atari_asteroid
|
| 16 |
metrics:
|
| 17 |
- type: mean_reward
|
| 18 |
+
value: 1218.00 +/- 462.14
|
| 19 |
name: mean_reward
|
| 20 |
verified: false
|
| 21 |
---
|
| 22 |
|
| 23 |
+
## About the Project
|
| 24 |
|
| 25 |
+
This project is an attempt to maximise performance of high sample throughput APPO RL models in Atari environments in as carbon efficient a manner as possible using a single, not particularly high performance single machine. It is about demonstrating the generalisability of on-policy algorithms to create good performance quickly (by sacrificing sample efficiency) while also proving that this route to RL production is accessible to even hobbyists like me (I am a gastroenterologist not a computer scientist).
|
|
|
|
| 26 |
|
| 27 |
|
| 28 |
+
## Project Aims
|
| 29 |
|
| 30 |
+
This model as with all the others in the benchmarks was trained initially asynchronously un-seeded to 10 million steps for the purposes of setting a sample factory async baseline for this model on this environment but only 3/57 made it anywhere near sota performance.
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
I then re-trained the models with 100 million timesteps- at this point 2 environments maxed out at sota performance (Pong and Freeway) with four approaching sota performance - (atlantis, boxing, tennis and fishingderby.) =6/57 near sota.
|
| 33 |
+
|
| 34 |
+
The aim now is to try and reach state-of-the-art (SOTA) performance on a further block of atari environments using up to 1 billion training timesteps initially with appo. I will flag the models with SOTA when they reach at or near these levels.
|
| 35 |
|
| 36 |
+
After this I will switch on V-Trace to see if the Impala variations perform any better with the same seed (I have seeded '1234')
|
| 37 |
|
|
|
|
| 38 |
|
| 39 |
+
## About the Model
|
| 40 |
+
|
| 41 |
+
The hyperparameters used in the model are described in my shell script on my fork of sample-factory: https://github.com/MattStammers/sample-factory. Given that https://huggingface.co/edbeeching has kindly shared his parameters, I saved time and energy by using many of his tuned hyperparameters to reduce carbon inefficiency:
|
| 42 |
```
|
| 43 |
hyperparameters = {
|
| 44 |
+
"help": false,
|
| 45 |
+
"algo": "APPO",
|
| 46 |
+
"env": "atari_asteroid",
|
| 47 |
+
"experiment": "atari_asteroid_APPO",
|
| 48 |
+
"train_dir": "./train_atari",
|
| 49 |
+
"restart_behavior": "restart",
|
| 50 |
"device": "gpu",
|
| 51 |
"seed": 1234,
|
| 52 |
"num_policies": 2,
|
|
|
|
| 144 |
"env_gpu_observations": true,
|
| 145 |
"env_frameskip": 4,
|
| 146 |
"env_framestack": 4,
|
| 147 |
+
"pixel_format": "CHW"
|
| 148 |
+
}
|
| 149 |
|
| 150 |
```
|
| 151 |
|
| 152 |
|
| 153 |
|
| 154 |
+
A(n) **APPO** model trained on the **atari_asteroid** environment.
|
| 155 |
+
|
| 156 |
+
This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory. Sample factory is a
|
| 157 |
+
high throughput on-policy RL framework. I have been using
|
| 158 |
+
Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
## Downloading the model
|
| 162 |
+
|
| 163 |
+
After installing Sample-Factory, download the model with:
|
| 164 |
+
```
|
| 165 |
+
python -m sample_factory.huggingface.load_from_hub -r MattStammers/APPO-atari_asteroid
|
| 166 |
+
```
|
| 167 |
+
|
| 168 |
+
|
| 169 |
## Using the model
|
| 170 |
|
| 171 |
To run the model after download, use the `enjoy` script corresponding to this environment:
|
checkpoint_p0/best_000003328_851968_reward_6.150.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d3b1d34866f93ce669b44c875e32eb2a08db9599671de2381e8ecdf818ab4dc6
|
| 3 |
+
size 20771187
|
checkpoint_p0/checkpoint_000002976_761856.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:00756fbccc5f403b76ea255d4e1354ed13a7b17ab5606488857562bfd6cb6139
|
| 3 |
+
size 20771651
|
checkpoint_p0/checkpoint_000003584_917504.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:996fbf05ca461b48108ff8cc3fd98d9b424dc4f5c2801c572895b6b938e1bb74
|
| 3 |
+
size 20771651
|
checkpoint_p1/best_000001696_434176_reward_6.290.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:98eda7c41fba1c02fdc6c7384f58c0a892323a3bb6465d07e28b47e0501a3146
|
| 3 |
+
size 20771187
|
checkpoint_p1/checkpoint_000003040_778240.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:af91ccea4ee73e81ec76e860944e3d40d0510d97fdc8aeb6252861c149391ff9
|
| 3 |
+
size 20771651
|
checkpoint_p1/checkpoint_000003648_933888.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fea6fd6f01e693c30be5fc076a9d7e46e2cb8b37cfbfce53ead3b2dd094e30f7
|
| 3 |
+
size 20771651
|
config.json
CHANGED
|
@@ -4,7 +4,7 @@
|
|
| 4 |
"env": "atari_asteroid",
|
| 5 |
"experiment": "atari_asteroid_APPO",
|
| 6 |
"train_dir": "./train_atari",
|
| 7 |
-
"restart_behavior": "
|
| 8 |
"device": "gpu",
|
| 9 |
"seed": 1234,
|
| 10 |
"num_policies": 2,
|
|
@@ -12,11 +12,11 @@
|
|
| 12 |
"serial_mode": false,
|
| 13 |
"batched_sampling": true,
|
| 14 |
"num_batches_to_accumulate": 2,
|
| 15 |
-
"worker_num_splits":
|
| 16 |
"policy_workers_per_policy": 1,
|
| 17 |
"max_policy_lag": 1000,
|
| 18 |
"num_workers": 16,
|
| 19 |
-
"num_envs_per_worker":
|
| 20 |
"batch_size": 1024,
|
| 21 |
"num_batches_per_epoch": 8,
|
| 22 |
"num_epochs": 4,
|
|
@@ -64,10 +64,10 @@
|
|
| 64 |
"experiment_summaries_interval": 3,
|
| 65 |
"flush_summaries_interval": 30,
|
| 66 |
"stats_avg": 100,
|
| 67 |
-
"summaries_use_frameskip":
|
| 68 |
"heartbeat_interval": 10,
|
| 69 |
"heartbeat_reporting_interval": 60,
|
| 70 |
-
"train_for_env_steps":
|
| 71 |
"train_for_seconds": 10000000000,
|
| 72 |
"save_every_sec": 120,
|
| 73 |
"keep_checkpoints": 2,
|
|
@@ -124,25 +124,27 @@
|
|
| 124 |
"pbt_target_objective": "true_objective",
|
| 125 |
"pbt_perturb_min": 1.1,
|
| 126 |
"pbt_perturb_max": 1.5,
|
| 127 |
-
"command_line": "--algo=APPO --env=atari_asteroid --experiment=atari_asteroid_APPO --num_policies=2 --restart_behavior=
|
| 128 |
"cli_args": {
|
| 129 |
"algo": "APPO",
|
| 130 |
"env": "atari_asteroid",
|
| 131 |
"experiment": "atari_asteroid_APPO",
|
| 132 |
"train_dir": "./train_atari",
|
| 133 |
-
"restart_behavior": "
|
| 134 |
"seed": 1234,
|
| 135 |
"num_policies": 2,
|
| 136 |
"async_rl": true,
|
| 137 |
"batched_sampling": true,
|
|
|
|
| 138 |
"num_workers": 16,
|
| 139 |
-
"num_envs_per_worker":
|
| 140 |
"batch_size": 1024,
|
| 141 |
"num_batches_per_epoch": 8,
|
| 142 |
"num_epochs": 4,
|
| 143 |
"exploration_loss_coeff": 0.0004677351413,
|
| 144 |
"max_grad_norm": 0.0,
|
| 145 |
"learning_rate": 0.0003033891184,
|
|
|
|
| 146 |
"heartbeat_interval": 10,
|
| 147 |
"heartbeat_reporting_interval": 60,
|
| 148 |
"train_for_env_steps": 100000000,
|
|
@@ -158,5 +160,5 @@
|
|
| 158 |
},
|
| 159 |
"git_hash": "5fff97c2f535da5987d358cdbe6927cccd43621e",
|
| 160 |
"git_repo_name": "not a git repository",
|
| 161 |
-
"wandb_unique_id": "
|
| 162 |
}
|
|
|
|
| 4 |
"env": "atari_asteroid",
|
| 5 |
"experiment": "atari_asteroid_APPO",
|
| 6 |
"train_dir": "./train_atari",
|
| 7 |
+
"restart_behavior": "resume",
|
| 8 |
"device": "gpu",
|
| 9 |
"seed": 1234,
|
| 10 |
"num_policies": 2,
|
|
|
|
| 12 |
"serial_mode": false,
|
| 13 |
"batched_sampling": true,
|
| 14 |
"num_batches_to_accumulate": 2,
|
| 15 |
+
"worker_num_splits": 2,
|
| 16 |
"policy_workers_per_policy": 1,
|
| 17 |
"max_policy_lag": 1000,
|
| 18 |
"num_workers": 16,
|
| 19 |
+
"num_envs_per_worker": 8,
|
| 20 |
"batch_size": 1024,
|
| 21 |
"num_batches_per_epoch": 8,
|
| 22 |
"num_epochs": 4,
|
|
|
|
| 64 |
"experiment_summaries_interval": 3,
|
| 65 |
"flush_summaries_interval": 30,
|
| 66 |
"stats_avg": 100,
|
| 67 |
+
"summaries_use_frameskip": false,
|
| 68 |
"heartbeat_interval": 10,
|
| 69 |
"heartbeat_reporting_interval": 60,
|
| 70 |
+
"train_for_env_steps": 500000000,
|
| 71 |
"train_for_seconds": 10000000000,
|
| 72 |
"save_every_sec": 120,
|
| 73 |
"keep_checkpoints": 2,
|
|
|
|
| 124 |
"pbt_target_objective": "true_objective",
|
| 125 |
"pbt_perturb_min": 1.1,
|
| 126 |
"pbt_perturb_max": 1.5,
|
| 127 |
+
"command_line": "--algo=APPO --env=atari_asteroid --experiment=atari_asteroid_APPO --num_policies=2 --restart_behavior=resume --train_dir=./train_atari --train_for_env_steps=100000000 --seed=1234 --num_workers=16 --num_envs_per_worker=8 --num_batches_per_epoch=8 --worker_num_splits=2 --async_rl=true --batched_sampling=true --batch_size=1024 --max_grad_norm=0 --learning_rate=0.0003033891184 --heartbeat_interval=10 --heartbeat_reporting_interval=60 --save_milestones_sec=1200 --num_epochs=4 --exploration_loss_coeff=0.0004677351413 --summaries_use_frameskip=False --with_wandb=true --wandb_user=matt-stammers --wandb_project=atari_APPO --wandb_group=atari_asteroid --wandb_job_type=SF --wandb_tags=atari",
|
| 128 |
"cli_args": {
|
| 129 |
"algo": "APPO",
|
| 130 |
"env": "atari_asteroid",
|
| 131 |
"experiment": "atari_asteroid_APPO",
|
| 132 |
"train_dir": "./train_atari",
|
| 133 |
+
"restart_behavior": "resume",
|
| 134 |
"seed": 1234,
|
| 135 |
"num_policies": 2,
|
| 136 |
"async_rl": true,
|
| 137 |
"batched_sampling": true,
|
| 138 |
+
"worker_num_splits": 2,
|
| 139 |
"num_workers": 16,
|
| 140 |
+
"num_envs_per_worker": 8,
|
| 141 |
"batch_size": 1024,
|
| 142 |
"num_batches_per_epoch": 8,
|
| 143 |
"num_epochs": 4,
|
| 144 |
"exploration_loss_coeff": 0.0004677351413,
|
| 145 |
"max_grad_norm": 0.0,
|
| 146 |
"learning_rate": 0.0003033891184,
|
| 147 |
+
"summaries_use_frameskip": false,
|
| 148 |
"heartbeat_interval": 10,
|
| 149 |
"heartbeat_reporting_interval": 60,
|
| 150 |
"train_for_env_steps": 100000000,
|
|
|
|
| 160 |
},
|
| 161 |
"git_hash": "5fff97c2f535da5987d358cdbe6927cccd43621e",
|
| 162 |
"git_repo_name": "not a git repository",
|
| 163 |
+
"wandb_unique_id": "atari_asteroid_APPO_20231017_124050_252182"
|
| 164 |
}
|
git.diff
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2464da1601e095c629f46a5ef1ef7322a64234d560931c130d8a8e640a96e217
|
| 3 |
+
size 14449550
|
replay.mp4
CHANGED
|
Binary files a/replay.mp4 and b/replay.mp4 differ
|
|
|
sf_log.txt
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|