seynath commited on
Commit
76780d1
·
verified ·
1 Parent(s): 59a9a2d

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -32,4 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
- replay.mp4 filter=lfs diff=lfs merge=lfs -text
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,143 +1,75 @@
1
  ---
2
- library_name: rl-algo-impls
3
  tags:
4
  - SpaceInvadersNoFrameskip-v4
5
- - dqn
6
  - deep-reinforcement-learning
7
  - reinforcement-learning
 
8
  model-index:
9
- - name: dqn
10
  results:
11
- - metrics:
12
- - type: mean_reward
13
- value: 2510.31 +/- 474.33
14
- name: mean_reward
15
- task:
16
  type: reinforcement-learning
17
  name: reinforcement-learning
18
  dataset:
19
  name: SpaceInvadersNoFrameskip-v4
20
  type: SpaceInvadersNoFrameskip-v4
 
 
 
 
 
21
  ---
22
- # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
23
-
24
- This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
25
-
26
- All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/v7d6z818.
27
 
28
- ## Training Results
 
 
 
29
 
30
- This model was trained from 3 trainings of **DQN** agents using different initial seeds. These agents were trained by checking out [e8bc541](https://github.com/sgoodfriend/rl-algo-impls/tree/e8bc541d8b5e67bb4d3f2075282463fb61f5f2c6). The best and last models were kept from each training. This submission has loaded the best models from each training, reevaluates them, and selects the best model from these latest evaluations (mean - std).
 
 
31
 
32
- | algo | env | seed | reward_mean | reward_std | eval_episodes | best | wandb_url |
33
- |:-------|:----------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
34
- | dqn | SpaceInvadersNoFrameskip-v4 | 1 | 2510.31 | 474.333 | 16 | * | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/86sbwx1e) |
35
- | dqn | SpaceInvadersNoFrameskip-v4 | 2 | 1691.25 | 447.757 | 16 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/rcxjbkfx) |
36
- | dqn | SpaceInvadersNoFrameskip-v4 | 3 | 1319.38 | 88.4038 | 16 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/dw0zab13) |
37
 
 
 
 
38
 
39
- ### Prerequisites: Weights & Biases (WandB)
40
- Training and benchmarking assumes you have a Weights & Biases project to upload runs to.
41
- By default training goes to a rl-algo-impls project while benchmarks go to
42
- rl-algo-impls-benchmarks. During training and benchmarking runs, videos of the best
43
- models and the model weights are uploaded to WandB.
44
-
45
- Before doing anything below, you'll need to create a wandb account and run `wandb
46
- login`.
47
-
48
-
49
-
50
- ## Usage
51
- /sgoodfriend/rl-algo-impls: https://github.com/sgoodfriend/rl-algo-impls
52
 
53
- Note: While the model state dictionary and hyperaparameters are saved, the latest
54
- implementation could be sufficiently different to not be able to reproduce similar
55
- results. You might need to checkout the commit the agent was trained on:
56
- [e8bc541](https://github.com/sgoodfriend/rl-algo-impls/tree/e8bc541d8b5e67bb4d3f2075282463fb61f5f2c6).
57
  ```
58
- # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
59
- python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/86sbwx1e
 
60
  ```
61
 
62
- Setup hasn't been completely worked out yet, so you might be best served by using Google
63
- Colab starting from the
64
- [colab_enjoy.ipynb](https://github.com/sgoodfriend/rl-algo-impls/blob/main/colab_enjoy.ipynb)
65
- notebook.
66
-
67
-
68
-
69
- ## Training
70
- If you want the highest chance to reproduce these results, you'll want to checkout the
71
- commit the agent was trained on: [e8bc541](https://github.com/sgoodfriend/rl-algo-impls/tree/e8bc541d8b5e67bb4d3f2075282463fb61f5f2c6). While
72
- training is deterministic, different hardware will give different results.
73
-
74
  ```
75
- python train.py --algo dqn --env SpaceInvadersNoFrameskip-v4 --seed 1
 
76
  ```
77
 
78
- Setup hasn't been completely worked out yet, so you might be best served by using Google
79
- Colab starting from the
80
- [colab_train.ipynb](https://github.com/sgoodfriend/rl-algo-impls/blob/main/colab_train.ipynb)
81
- notebook.
82
-
83
-
84
-
85
- ## Benchmarking (with Lambda Labs instance)
86
- This and other models from https://api.wandb.ai/links/sgoodfriend/v7d6z818 were generated by running a script on a Lambda
87
- Labs instance. In a Lambda Labs instance terminal:
88
  ```
89
- git clone git@github.com:sgoodfriend/rl-algo-impls.git
90
- cd rl-algo-impls
91
- bash ./lambda_labs/setup.sh
92
- wandb login
93
- bash ./lambda_labs/benchmark.sh
94
  ```
95
 
96
- ### Alternative: Google Colab Pro+
97
- As an alternative,
98
- [colab_benchmark.ipynb](https://github.com/sgoodfriend/rl-algo-impls/tree/main/benchmarks#:~:text=colab_benchmark.ipynb),
99
- can be used. However, this requires a Google Colab Pro+ subscription and running across
100
- 4 separate instances because otherwise running all jobs will exceed the 24-hour limit.
101
-
102
-
103
-
104
  ## Hyperparameters
105
- This isn't exactly the format of hyperparams in hyperparams/dqn.yml, but instead the Wandb Run Config. However, it's very
106
- close and has some additional data:
107
- ```
108
- algo: dqn
109
- algo_hyperparams:
110
- batch_size: 32
111
- buffer_size: 100000
112
- exploration_final_eps: 0.01
113
- exploration_fraction: 0.1
114
- gradient_steps: 2
115
- learning_rate: 0.0001
116
- learning_starts: 100000
117
- target_update_interval: 1000
118
- train_freq: 8
119
- env: impala-SpaceInvadersNoFrameskip-v4
120
- env_hyperparams:
121
- frame_stack: 4
122
- n_envs: 8
123
- no_reward_fire_steps: 500
124
- no_reward_timeout_steps: 1000
125
- vec_env_class: subproc
126
- env_id: SpaceInvadersNoFrameskip-v4
127
- eval_params:
128
- deterministic: false
129
- n_timesteps: 10000000
130
- policy_hyperparams:
131
- cnn_feature_dim: 256
132
- cnn_layers_init_orthogonal: false
133
- cnn_style: impala
134
- init_layers_orthogonal: true
135
- seed: 1
136
- use_deterministic_algorithms: true
137
- wandb_entity: null
138
- wandb_project_name: rl-algo-impls-benchmarks
139
- wandb_tags:
140
- - benchmark_e8bc541
141
- - host_192-9-228-51
142
-
143
  ```
 
1
  ---
2
+ library_name: stable-baselines3
3
  tags:
4
  - SpaceInvadersNoFrameskip-v4
 
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
+ - stable-baselines3
8
  model-index:
9
+ - name: QRDQN
10
  results:
11
+ - task:
 
 
 
 
12
  type: reinforcement-learning
13
  name: reinforcement-learning
14
  dataset:
15
  name: SpaceInvadersNoFrameskip-v4
16
  type: SpaceInvadersNoFrameskip-v4
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 1059.00 +/- 439.89
20
+ name: mean_reward
21
+ verified: false
22
  ---
 
 
 
 
 
23
 
24
+ # **QRDQN** Agent playing **SpaceInvadersNoFrameskip-v4**
25
+ This is a trained model of a **QRDQN** agent playing **SpaceInvadersNoFrameskip-v4**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
+ and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
 
29
+ The RL Zoo is a training framework for Stable Baselines3
30
+ reinforcement learning agents,
31
+ with hyperparameter optimization and pre-trained agents included.
32
 
33
+ ## Usage (with SB3 RL Zoo)
 
 
 
 
34
 
35
+ RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
36
+ SB3: https://github.com/DLR-RM/stable-baselines3<br/>
37
+ SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
 
39
+ Install the RL Zoo (with SB3 and SB3-Contrib):
40
+ ```bash
41
+ pip install rl_zoo3
42
+ ```
 
 
 
 
 
 
 
 
 
43
 
 
 
 
 
44
  ```
45
+ # Download model and save it into the logs/ folder
46
+ python -m rl_zoo3.load_from_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -orga Mihail-P -f logs/
47
+ python -m rl_zoo3.enjoy --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/
48
  ```
49
 
50
+ If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
 
 
 
 
 
 
 
 
 
 
 
51
  ```
52
+ python -m rl_zoo3.load_from_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -orga Mihail-P -f logs/
53
+ python -m rl_zoo3.enjoy --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/
54
  ```
55
 
56
+ ## Training (with the RL Zoo)
 
 
 
 
 
 
 
 
 
57
  ```
58
+ python -m rl_zoo3.train --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/
59
+ # Upload the model and generate video (when possible)
60
+ python -m rl_zoo3.push_to_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga Mihail-P
 
 
61
  ```
62
 
 
 
 
 
 
 
 
 
63
  ## Hyperparameters
64
+ ```python
65
+ OrderedDict([('batch_size', 64),
66
+ ('buffer_size', 150000),
67
+ ('env_wrapper',
68
+ ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
69
+ ('exploration_fraction', 0.025),
70
+ ('frame_stack', 4),
71
+ ('n_timesteps', 10000000.0),
72
+ ('optimize_memory_usage', False),
73
+ ('policy', 'CnnPolicy'),
74
+ ('normalize', False)])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ```
args.yml ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!python/object/apply:collections.OrderedDict
2
+ - - - algo
3
+ - qrdqn
4
+ - - conf_file
5
+ - null
6
+ - - device
7
+ - auto
8
+ - - env
9
+ - SpaceInvadersNoFrameskip-v4
10
+ - - env_kwargs
11
+ - null
12
+ - - eval_episodes
13
+ - 25
14
+ - - eval_freq
15
+ - 50000
16
+ - - gym_packages
17
+ - []
18
+ - - hyperparams
19
+ - batch_size: 64
20
+ buffer_size: 150000
21
+ exploration_fraction: 0.025
22
+ - - log_folder
23
+ - logs/
24
+ - - log_interval
25
+ - -1
26
+ - - max_total_trials
27
+ - null
28
+ - - n_eval_envs
29
+ - 1
30
+ - - n_evaluations
31
+ - null
32
+ - - n_jobs
33
+ - 1
34
+ - - n_startup_trials
35
+ - 10
36
+ - - n_timesteps
37
+ - 10000000
38
+ - - n_trials
39
+ - 500
40
+ - - no_optim_plots
41
+ - false
42
+ - - num_threads
43
+ - -1
44
+ - - optimization_log_path
45
+ - null
46
+ - - optimize_hyperparameters
47
+ - false
48
+ - - progress
49
+ - false
50
+ - - pruner
51
+ - median
52
+ - - sampler
53
+ - tpe
54
+ - - save_freq
55
+ - 1000000
56
+ - - save_replay_buffer
57
+ - false
58
+ - - seed
59
+ - 1078690446
60
+ - - storage
61
+ - null
62
+ - - study_name
63
+ - null
64
+ - - tensorboard_log
65
+ - /tblogs/
66
+ - - track
67
+ - false
68
+ - - trained_agent
69
+ - ''
70
+ - - truncate_last_trajectory
71
+ - true
72
+ - - uuid
73
+ - false
74
+ - - vec_env
75
+ - dummy
76
+ - - verbose
77
+ - 1
78
+ - - wandb_entity
79
+ - null
80
+ - - wandb_project_name
81
+ - sb3
82
+ - - wandb_tags
83
+ - []
config.yml ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!python/object/apply:collections.OrderedDict
2
+ - - - batch_size
3
+ - 64
4
+ - - buffer_size
5
+ - 150000
6
+ - - env_wrapper
7
+ - - stable_baselines3.common.atari_wrappers.AtariWrapper
8
+ - - exploration_fraction
9
+ - 0.025
10
+ - - frame_stack
11
+ - 4
12
+ - - n_timesteps
13
+ - 10000000.0
14
+ - - optimize_memory_usage
15
+ - false
16
+ - - policy
17
+ - CnnPolicy
env_kwargs.yml ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
qrdqn-SpaceInvadersNoFrameskip-v4.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa14b2afbb208318137d292f4016b9ba0b7b0f6c3b48d86bd76f152aee5e2da3
3
+ size 37018450
qrdqn-SpaceInvadersNoFrameskip-v4/_stable_baselines3_version ADDED
@@ -0,0 +1 @@
 
 
1
+ 2.0.0a5
qrdqn-SpaceInvadersNoFrameskip-v4/data ADDED
The diff for this file is too large to render. See raw diff
 
qrdqn-SpaceInvadersNoFrameskip-v4/policy.optimizer.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:154676a3145cd9fe89910a8f893cb58b0826abe0a786663b7014bc9de9c60e6c
3
+ size 18405835
qrdqn-SpaceInvadersNoFrameskip-v4/policy.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:001e2b0d93e6bfd8a0c88967236b44f464cc76ba8993425e3a0e8320f662e75e
3
+ size 18405545
qrdqn-SpaceInvadersNoFrameskip-v4/pytorch_variables.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d030ad8db708280fcae77d87e973102039acd23a11bdecc3db8eb6c0ac940ee1
3
+ size 431
qrdqn-SpaceInvadersNoFrameskip-v4/system_info.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ - OS: Linux-5.10.147+-x86_64-with-glibc2.31 # 1 SMP Sat Dec 10 16:00:40 UTC 2022
2
+ - Python: 3.9.16
3
+ - Stable-Baselines3: 2.0.0a5
4
+ - PyTorch: 2.0.0+cu118
5
+ - GPU Enabled: True
6
+ - Numpy: 1.22.4
7
+ - Cloudpickle: 2.2.1
8
+ - Gymnasium: 0.28.1
9
+ - OpenAI Gym: 0.26.2
results.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mean_reward": 1059.0, "std_reward": 439.89089556388865, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2023-04-17T06:17:02.113019"}
train_eval_metrics.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c462cb45c3f36e46ef59f3ac324b4caaab97d843b88a0ce88468d2b40dfb97c7
3
+ size 308560