MattStammers commited on
Commit
42fd5a6
·
1 Parent(s): 4bdc863

Upload folder using huggingface_hub

Browse files
.summary/0/events.out.tfevents.1697542852.rhmmedcatt-proliant-ml350-gen10 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ee1d5e49d893ff354bef06bf11e0ca35dbc184f6461d7355b0064e5f31d3d79
3
+ size 28212
.summary/0/events.out.tfevents.1697545572.rhmmedcatt-proliant-ml350-gen10 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:293d37db085e936373971881782e5d5df4e7de9b15b5453e9cf2c0e990bde9a0
3
+ size 209301
.summary/1/events.out.tfevents.1697542852.rhmmedcatt-proliant-ml350-gen10 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f95ee9fddd1c6ad6c44eab8c668c6e06936d0cd1de17fc07a53cab4a724ddd3
3
+ size 19665
.summary/1/events.out.tfevents.1697545572.rhmmedcatt-proliant-ml350-gen10 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f9cacb5ded5dc10b3cd59ef32cfb55dbe73b644a3fd6ad171b3c71df1b153f1
3
+ size 151382
README.md CHANGED
@@ -15,35 +15,38 @@ model-index:
15
  type: atari_asteroid
16
  metrics:
17
  - type: mean_reward
18
- value: 2228.00 +/- 433.72
19
  name: mean_reward
20
  verified: false
21
  ---
22
 
23
- A(n) **APPO** model trained on the **atari_asteroid** environment.
24
 
25
- This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory.
26
- Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/
27
 
28
 
29
- ## Downloading the model
30
 
31
- After installing Sample-Factory, download the model with:
32
- ```
33
- python -m sample_factory.huggingface.load_from_hub -r MattStammers/APPO-atari_asteroid
34
- ```
35
 
36
-
37
- ## About the Model
 
38
 
39
- This model as with all the others in the benchmarks was trained initially asynchronously un-seeded to 10 million steps for the purposes of setting a sample factory async baseline for this model on this environment but only 3/57 made it.
40
 
41
- The aim is to reach state-of-the-art (SOTA) performance on each atari environment. I will flag the models with SOTA when they reach at or near these levels.
42
 
43
- The hyperparameters used in the model are the ones I have pushed to my fork of sample-factory: https://github.com/MattStammers/sample-factory. Given that https://huggingface.co/edbeeching has kindly shared his.
44
- I saved time and energy by using many of his tuned hyperparameters to maximise performance. However, he used 2 billion training steps. I have started as explained above at 10 million then moved to 100m to see how performance goes:
 
45
  ```
46
  hyperparameters = {
 
 
 
 
 
 
47
  "device": "gpu",
48
  "seed": 1234,
49
  "num_policies": 2,
@@ -141,12 +144,28 @@ hyperparameters = {
141
  "env_gpu_observations": true,
142
  "env_frameskip": 4,
143
  "env_framestack": 4,
144
- }
 
145
 
146
  ```
147
 
148
 
149
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
  ## Using the model
151
 
152
  To run the model after download, use the `enjoy` script corresponding to this environment:
 
15
  type: atari_asteroid
16
  metrics:
17
  - type: mean_reward
18
+ value: 1218.00 +/- 462.14
19
  name: mean_reward
20
  verified: false
21
  ---
22
 
23
+ ## About the Project
24
 
25
+ This project is an attempt to maximise performance of high sample throughput APPO RL models in Atari environments in as carbon efficient a manner as possible using a single, not particularly high performance single machine. It is about demonstrating the generalisability of on-policy algorithms to create good performance quickly (by sacrificing sample efficiency) while also proving that this route to RL production is accessible to even hobbyists like me (I am a gastroenterologist not a computer scientist).
 
26
 
27
 
28
+ ## Project Aims
29
 
30
+ This model as with all the others in the benchmarks was trained initially asynchronously un-seeded to 10 million steps for the purposes of setting a sample factory async baseline for this model on this environment but only 3/57 made it anywhere near sota performance.
 
 
 
31
 
32
+ I then re-trained the models with 100 million timesteps- at this point 2 environments maxed out at sota performance (Pong and Freeway) with four approaching sota performance - (atlantis, boxing, tennis and fishingderby.) =6/57 near sota.
33
+
34
+ The aim now is to try and reach state-of-the-art (SOTA) performance on a further block of atari environments using up to 1 billion training timesteps initially with appo. I will flag the models with SOTA when they reach at or near these levels.
35
 
36
+ After this I will switch on V-Trace to see if the Impala variations perform any better with the same seed (I have seeded '1234')
37
 
 
38
 
39
+ ## About the Model
40
+
41
+ The hyperparameters used in the model are described in my shell script on my fork of sample-factory: https://github.com/MattStammers/sample-factory. Given that https://huggingface.co/edbeeching has kindly shared his parameters, I saved time and energy by using many of his tuned hyperparameters to reduce carbon inefficiency:
42
  ```
43
  hyperparameters = {
44
+ "help": false,
45
+ "algo": "APPO",
46
+ "env": "atari_asteroid",
47
+ "experiment": "atari_asteroid_APPO",
48
+ "train_dir": "./train_atari",
49
+ "restart_behavior": "restart",
50
  "device": "gpu",
51
  "seed": 1234,
52
  "num_policies": 2,
 
144
  "env_gpu_observations": true,
145
  "env_frameskip": 4,
146
  "env_framestack": 4,
147
+ "pixel_format": "CHW"
148
+ }
149
 
150
  ```
151
 
152
 
153
 
154
+ A(n) **APPO** model trained on the **atari_asteroid** environment.
155
+
156
+ This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory. Sample factory is a
157
+ high throughput on-policy RL framework. I have been using
158
+ Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/
159
+
160
+
161
+ ## Downloading the model
162
+
163
+ After installing Sample-Factory, download the model with:
164
+ ```
165
+ python -m sample_factory.huggingface.load_from_hub -r MattStammers/APPO-atari_asteroid
166
+ ```
167
+
168
+
169
  ## Using the model
170
 
171
  To run the model after download, use the `enjoy` script corresponding to this environment:
checkpoint_p0/best_000003328_851968_reward_6.150.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3b1d34866f93ce669b44c875e32eb2a08db9599671de2381e8ecdf818ab4dc6
3
+ size 20771187
checkpoint_p0/checkpoint_000002976_761856.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00756fbccc5f403b76ea255d4e1354ed13a7b17ab5606488857562bfd6cb6139
3
+ size 20771651
checkpoint_p0/checkpoint_000003584_917504.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:996fbf05ca461b48108ff8cc3fd98d9b424dc4f5c2801c572895b6b938e1bb74
3
+ size 20771651
checkpoint_p1/best_000001696_434176_reward_6.290.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98eda7c41fba1c02fdc6c7384f58c0a892323a3bb6465d07e28b47e0501a3146
3
+ size 20771187
checkpoint_p1/checkpoint_000003040_778240.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af91ccea4ee73e81ec76e860944e3d40d0510d97fdc8aeb6252861c149391ff9
3
+ size 20771651
checkpoint_p1/checkpoint_000003648_933888.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fea6fd6f01e693c30be5fc076a9d7e46e2cb8b37cfbfce53ead3b2dd094e30f7
3
+ size 20771651
config.json CHANGED
@@ -4,7 +4,7 @@
4
  "env": "atari_asteroid",
5
  "experiment": "atari_asteroid_APPO",
6
  "train_dir": "./train_atari",
7
- "restart_behavior": "restart",
8
  "device": "gpu",
9
  "seed": 1234,
10
  "num_policies": 2,
@@ -12,11 +12,11 @@
12
  "serial_mode": false,
13
  "batched_sampling": true,
14
  "num_batches_to_accumulate": 2,
15
- "worker_num_splits": 1,
16
  "policy_workers_per_policy": 1,
17
  "max_policy_lag": 1000,
18
  "num_workers": 16,
19
- "num_envs_per_worker": 2,
20
  "batch_size": 1024,
21
  "num_batches_per_epoch": 8,
22
  "num_epochs": 4,
@@ -64,10 +64,10 @@
64
  "experiment_summaries_interval": 3,
65
  "flush_summaries_interval": 30,
66
  "stats_avg": 100,
67
- "summaries_use_frameskip": true,
68
  "heartbeat_interval": 10,
69
  "heartbeat_reporting_interval": 60,
70
- "train_for_env_steps": 100000000,
71
  "train_for_seconds": 10000000000,
72
  "save_every_sec": 120,
73
  "keep_checkpoints": 2,
@@ -124,25 +124,27 @@
124
  "pbt_target_objective": "true_objective",
125
  "pbt_perturb_min": 1.1,
126
  "pbt_perturb_max": 1.5,
127
- "command_line": "--algo=APPO --env=atari_asteroid --experiment=atari_asteroid_APPO --num_policies=2 --restart_behavior=restart --train_dir=./train_atari --train_for_env_steps=100000000 --seed=1234 --num_workers=16 --num_envs_per_worker=2 --num_batches_per_epoch=8 --async_rl=true --batched_sampling=true --batch_size=1024 --max_grad_norm=0 --learning_rate=0.0003033891184 --heartbeat_interval=10 --heartbeat_reporting_interval=60 --save_milestones_sec=1200 --num_epochs=4 --exploration_loss_coeff=0.0004677351413 --with_wandb=true --wandb_user=matt-stammers --wandb_project=atari_APPO --wandb_group=atari_asteroid --wandb_job_type=SF --wandb_tags=atari",
128
  "cli_args": {
129
  "algo": "APPO",
130
  "env": "atari_asteroid",
131
  "experiment": "atari_asteroid_APPO",
132
  "train_dir": "./train_atari",
133
- "restart_behavior": "restart",
134
  "seed": 1234,
135
  "num_policies": 2,
136
  "async_rl": true,
137
  "batched_sampling": true,
 
138
  "num_workers": 16,
139
- "num_envs_per_worker": 2,
140
  "batch_size": 1024,
141
  "num_batches_per_epoch": 8,
142
  "num_epochs": 4,
143
  "exploration_loss_coeff": 0.0004677351413,
144
  "max_grad_norm": 0.0,
145
  "learning_rate": 0.0003033891184,
 
146
  "heartbeat_interval": 10,
147
  "heartbeat_reporting_interval": 60,
148
  "train_for_env_steps": 100000000,
@@ -158,5 +160,5 @@
158
  },
159
  "git_hash": "5fff97c2f535da5987d358cdbe6927cccd43621e",
160
  "git_repo_name": "not a git repository",
161
- "wandb_unique_id": "atari_asteroid_APPO_20231008_114614_125146"
162
  }
 
4
  "env": "atari_asteroid",
5
  "experiment": "atari_asteroid_APPO",
6
  "train_dir": "./train_atari",
7
+ "restart_behavior": "resume",
8
  "device": "gpu",
9
  "seed": 1234,
10
  "num_policies": 2,
 
12
  "serial_mode": false,
13
  "batched_sampling": true,
14
  "num_batches_to_accumulate": 2,
15
+ "worker_num_splits": 2,
16
  "policy_workers_per_policy": 1,
17
  "max_policy_lag": 1000,
18
  "num_workers": 16,
19
+ "num_envs_per_worker": 8,
20
  "batch_size": 1024,
21
  "num_batches_per_epoch": 8,
22
  "num_epochs": 4,
 
64
  "experiment_summaries_interval": 3,
65
  "flush_summaries_interval": 30,
66
  "stats_avg": 100,
67
+ "summaries_use_frameskip": false,
68
  "heartbeat_interval": 10,
69
  "heartbeat_reporting_interval": 60,
70
+ "train_for_env_steps": 500000000,
71
  "train_for_seconds": 10000000000,
72
  "save_every_sec": 120,
73
  "keep_checkpoints": 2,
 
124
  "pbt_target_objective": "true_objective",
125
  "pbt_perturb_min": 1.1,
126
  "pbt_perturb_max": 1.5,
127
+ "command_line": "--algo=APPO --env=atari_asteroid --experiment=atari_asteroid_APPO --num_policies=2 --restart_behavior=resume --train_dir=./train_atari --train_for_env_steps=100000000 --seed=1234 --num_workers=16 --num_envs_per_worker=8 --num_batches_per_epoch=8 --worker_num_splits=2 --async_rl=true --batched_sampling=true --batch_size=1024 --max_grad_norm=0 --learning_rate=0.0003033891184 --heartbeat_interval=10 --heartbeat_reporting_interval=60 --save_milestones_sec=1200 --num_epochs=4 --exploration_loss_coeff=0.0004677351413 --summaries_use_frameskip=False --with_wandb=true --wandb_user=matt-stammers --wandb_project=atari_APPO --wandb_group=atari_asteroid --wandb_job_type=SF --wandb_tags=atari",
128
  "cli_args": {
129
  "algo": "APPO",
130
  "env": "atari_asteroid",
131
  "experiment": "atari_asteroid_APPO",
132
  "train_dir": "./train_atari",
133
+ "restart_behavior": "resume",
134
  "seed": 1234,
135
  "num_policies": 2,
136
  "async_rl": true,
137
  "batched_sampling": true,
138
+ "worker_num_splits": 2,
139
  "num_workers": 16,
140
+ "num_envs_per_worker": 8,
141
  "batch_size": 1024,
142
  "num_batches_per_epoch": 8,
143
  "num_epochs": 4,
144
  "exploration_loss_coeff": 0.0004677351413,
145
  "max_grad_norm": 0.0,
146
  "learning_rate": 0.0003033891184,
147
+ "summaries_use_frameskip": false,
148
  "heartbeat_interval": 10,
149
  "heartbeat_reporting_interval": 60,
150
  "train_for_env_steps": 100000000,
 
160
  },
161
  "git_hash": "5fff97c2f535da5987d358cdbe6927cccd43621e",
162
  "git_repo_name": "not a git repository",
163
+ "wandb_unique_id": "atari_asteroid_APPO_20231017_124050_252182"
164
  }
git.diff CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3357904f421d3f4924836316b1741bf64d5dd0e807d5e80ac07059b4c52a7008
3
- size 14426734
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2464da1601e095c629f46a5ef1ef7322a64234d560931c130d8a8e640a96e217
3
+ size 14449550
replay.mp4 CHANGED
Binary files a/replay.mp4 and b/replay.mp4 differ
 
sf_log.txt CHANGED
The diff for this file is too large to render. See raw diff