erandaheshan43 commited on
Commit
4c7ee70
·
verified ·
1 Parent(s): fcc3944

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -25,6 +25,7 @@
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
28
  *.tflite filter=lfs diff=lfs merge=lfs -text
29
  *.tgz filter=lfs diff=lfs merge=lfs -text
30
  *.wasm filter=lfs diff=lfs merge=lfs -text
@@ -32,4 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
- replay.mp4 filter=lfs diff=lfs merge=lfs -text
 
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,138 +1,87 @@
1
  ---
2
- library_name: rl-algo-impls
3
  tags:
4
  - SpaceInvadersNoFrameskip-v4
5
- - a2c
6
  - deep-reinforcement-learning
7
  - reinforcement-learning
 
8
  model-index:
9
- - name: a2c
10
  results:
11
- - metrics:
12
- - type: mean_reward
13
- value: 1084.06 +/- 245.28
14
- name: mean_reward
15
- task:
16
  type: reinforcement-learning
17
  name: reinforcement-learning
18
  dataset:
19
  name: SpaceInvadersNoFrameskip-v4
20
  type: SpaceInvadersNoFrameskip-v4
 
 
 
 
 
21
  ---
22
- # **A2C** Agent playing **SpaceInvadersNoFrameskip-v4**
23
-
24
- This is a trained model of a **A2C** agent playing **SpaceInvadersNoFrameskip-v4** using the [/sgoodfriend/rl-algo-impls](https://github.com/sgoodfriend/rl-algo-impls) repo.
25
-
26
- All models trained at this commit can be found at https://api.wandb.ai/links/sgoodfriend/ysd5gj7p.
27
 
28
- ## Training Results
 
 
 
29
 
30
- This model was trained from 3 trainings of **A2C** agents using different initial seeds. These agents were trained by checking out [983cb75](https://github.com/sgoodfriend/rl-algo-impls/tree/983cb75e43e51cf4ef57f177194ab9a4a1a8808b). The best and last models were kept from each training. This submission has loaded the best models from each training, reevaluates them, and selects the best model from these latest evaluations (mean - std).
 
 
31
 
32
- | algo | env | seed | reward_mean | reward_std | eval_episodes | best | wandb_url |
33
- |:-------|:----------------------------|-------:|--------------:|-------------:|----------------:|:-------|:-----------------------------------------------------------------------------|
34
- | a2c | SpaceInvadersNoFrameskip-v4 | 1 | 1084.06 | 245.282 | 16 | * | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/1xnaha2m) |
35
- | a2c | SpaceInvadersNoFrameskip-v4 | 2 | 911.562 | 127.937 | 16 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/2ikwmahh) |
36
- | a2c | SpaceInvadersNoFrameskip-v4 | 3 | 1034.06 | 354.746 | 16 | | [wandb](https://wandb.ai/sgoodfriend/rl-algo-impls-benchmarks/runs/n9pubrvf) |
37
 
 
 
 
 
38
 
39
- ### Prerequisites: Weights & Biases (WandB)
40
- Training and benchmarking assumes you have a Weights & Biases project to upload runs to.
41
- By default training goes to a rl-algo-impls project while benchmarks go to
42
- rl-algo-impls-benchmarks. During training and benchmarking runs, videos of the best
43
- models and the model weights are uploaded to WandB.
44
-
45
- Before doing anything below, you'll need to create a wandb account and run `wandb
46
- login`.
47
-
48
-
49
-
50
- ## Usage
51
- /sgoodfriend/rl-algo-impls: https://github.com/sgoodfriend/rl-algo-impls
52
 
53
- Note: While the model state dictionary and hyperaparameters are saved, the latest
54
- implementation could be sufficiently different to not be able to reproduce similar
55
- results. You might need to checkout the commit the agent was trained on:
56
- [983cb75](https://github.com/sgoodfriend/rl-algo-impls/tree/983cb75e43e51cf4ef57f177194ab9a4a1a8808b).
57
  ```
58
- # Downloads the model, sets hyperparameters, and runs agent for 3 episodes
59
- python enjoy.py --wandb-run-path=sgoodfriend/rl-algo-impls-benchmarks/1xnaha2m
 
60
  ```
61
 
62
- Setup hasn't been completely worked out yet, so you might be best served by using Google
63
- Colab starting from the
64
- [colab_enjoy.ipynb](https://github.com/sgoodfriend/rl-algo-impls/blob/main/colab_enjoy.ipynb)
65
- notebook.
66
-
67
-
68
-
69
- ## Training
70
- If you want the highest chance to reproduce these results, you'll want to checkout the
71
- commit the agent was trained on: [983cb75](https://github.com/sgoodfriend/rl-algo-impls/tree/983cb75e43e51cf4ef57f177194ab9a4a1a8808b). While
72
- training is deterministic, different hardware will give different results.
73
-
74
  ```
75
- python train.py --algo a2c --env SpaceInvadersNoFrameskip-v4 --seed 1
 
76
  ```
77
 
78
- Setup hasn't been completely worked out yet, so you might be best served by using Google
79
- Colab starting from the
80
- [colab_train.ipynb](https://github.com/sgoodfriend/rl-algo-impls/blob/main/colab_train.ipynb)
81
- notebook.
82
-
83
-
84
-
85
- ## Benchmarking (with Lambda Labs instance)
86
- This and other models from https://api.wandb.ai/links/sgoodfriend/ysd5gj7p were generated by running a script on a Lambda
87
- Labs instance. In a Lambda Labs instance terminal:
88
  ```
89
- git clone git@github.com:sgoodfriend/rl-algo-impls.git
90
- cd rl-algo-impls
91
- bash ./lambda_labs/setup.sh
92
- wandb login
93
- bash ./lambda_labs/benchmark.sh [-a {"ppo a2c dqn vpg"}] [-e ENVS] [-j {6}] [-p {rl-algo-impls-benchmarks}] [-s {"1 2 3"}]
94
  ```
95
 
96
- ### Alternative: Google Colab Pro+
97
- As an alternative,
98
- [colab_benchmark.ipynb](https://github.com/sgoodfriend/rl-algo-impls/tree/main/benchmarks#:~:text=colab_benchmark.ipynb),
99
- can be used. However, this requires a Google Colab Pro+ subscription and running across
100
- 4 separate instances because otherwise running all jobs will exceed the 24-hour limit.
101
-
102
-
103
-
104
  ## Hyperparameters
105
- This isn't exactly the format of hyperparams in hyperparams/a2c.yml, but instead the Wandb Run Config. However, it's very
106
- close and has some additional data:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ```
108
- additional_keys_to_log: []
109
- algo: a2c
110
- algo_hyperparams:
111
- ent_coef: 0.01
112
- vf_coef: 0.25
113
- device: auto
114
- env: SpaceInvadersNoFrameskip-v4
115
- env_hyperparams:
116
- frame_stack: 4
117
- n_envs: 16
118
- no_reward_fire_steps: 500
119
- no_reward_timeout_steps: 1000
120
- vec_env_class: async
121
- env_id: null
122
- eval_hyperparams: {}
123
- microrts_reward_decay_callback: false
124
- n_timesteps: 10000000
125
- policy_hyperparams:
126
- activation_fn: relu
127
- seed: 1
128
- use_deterministic_algorithms: true
129
- wandb_entity: null
130
- wandb_group: null
131
- wandb_project_name: rl-algo-impls-benchmarks
132
- wandb_tags:
133
- - benchmark_983cb75
134
- - host_129-159-43-75
135
- - branch_main
136
- - v0.0.9
137
 
 
 
 
138
  ```
 
1
  ---
2
+ library_name: stable-baselines3
3
  tags:
4
  - SpaceInvadersNoFrameskip-v4
 
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
+ - stable-baselines3
8
  model-index:
9
+ - name: DQN
10
  results:
11
+ - task:
 
 
 
 
12
  type: reinforcement-learning
13
  name: reinforcement-learning
14
  dataset:
15
  name: SpaceInvadersNoFrameskip-v4
16
  type: SpaceInvadersNoFrameskip-v4
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 713.50 +/- 96.83
20
+ name: mean_reward
21
+ verified: false
22
  ---
 
 
 
 
 
23
 
24
+ # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
25
+ This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
+ and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
 
29
+ The RL Zoo is a training framework for Stable Baselines3
30
+ reinforcement learning agents,
31
+ with hyperparameter optimization and pre-trained agents included.
32
 
33
+ ## Usage (with SB3 RL Zoo)
 
 
 
 
34
 
35
+ RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
36
+ SB3: https://github.com/DLR-RM/stable-baselines3<br/>
37
+ SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
+ SBX (SB3 + Jax): https://github.com/araffin/sbx
39
 
40
+ Install the RL Zoo (with SB3 and SB3-Contrib):
41
+ ```bash
42
+ pip install rl_zoo3
43
+ ```
 
 
 
 
 
 
 
 
 
44
 
 
 
 
 
45
  ```
46
+ # Download model and save it into the logs/ folder
47
+ python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga Kolosok -f logs/
48
+ python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
49
  ```
50
 
51
+ If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
 
 
 
 
 
 
 
 
 
 
 
52
  ```
53
+ python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga Kolosok -f logs/
54
+ python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
55
  ```
56
 
57
+ ## Training (with the RL Zoo)
 
 
 
 
 
 
 
 
 
58
  ```
59
+ python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
60
+ # Upload the model and generate video (when possible)
61
+ python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga Kolosok
 
 
62
  ```
63
 
 
 
 
 
 
 
 
 
64
  ## Hyperparameters
65
+ ```python
66
+ OrderedDict([('batch_size', 32),
67
+ ('buffer_size', 100000),
68
+ ('env_wrapper',
69
+ ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
70
+ ('exploration_final_eps', 0.01),
71
+ ('exploration_fraction', 0.1),
72
+ ('frame_stack', 4),
73
+ ('gradient_steps', 1),
74
+ ('learning_rate', 0.0001),
75
+ ('learning_starts', 100000),
76
+ ('n_timesteps', 1000000.0),
77
+ ('optimize_memory_usage', False),
78
+ ('policy', 'CnnPolicy'),
79
+ ('target_update_interval', 1000),
80
+ ('train_freq', 4),
81
+ ('normalize', False)])
82
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
+ # Environment Arguments
85
+ ```python
86
+ {'render_mode': 'rgb_array'}
87
  ```
args.yml ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!python/object/apply:collections.OrderedDict
2
+ - - - algo
3
+ - dqn
4
+ - - conf_file
5
+ - dqn.yml
6
+ - - device
7
+ - auto
8
+ - - env
9
+ - SpaceInvadersNoFrameskip-v4
10
+ - - env_kwargs
11
+ - null
12
+ - - eval_env_kwargs
13
+ - null
14
+ - - eval_episodes
15
+ - 5
16
+ - - eval_freq
17
+ - 25000
18
+ - - gym_packages
19
+ - []
20
+ - - hyperparams
21
+ - null
22
+ - - log_folder
23
+ - logs/
24
+ - - log_interval
25
+ - -1
26
+ - - max_total_trials
27
+ - null
28
+ - - n_eval_envs
29
+ - 1
30
+ - - n_evaluations
31
+ - null
32
+ - - n_jobs
33
+ - 1
34
+ - - n_startup_trials
35
+ - 10
36
+ - - n_timesteps
37
+ - -1
38
+ - - n_trials
39
+ - 500
40
+ - - no_optim_plots
41
+ - false
42
+ - - num_threads
43
+ - -1
44
+ - - optimization_log_path
45
+ - null
46
+ - - optimize_hyperparameters
47
+ - false
48
+ - - progress
49
+ - false
50
+ - - pruner
51
+ - median
52
+ - - sampler
53
+ - tpe
54
+ - - save_freq
55
+ - -1
56
+ - - save_replay_buffer
57
+ - false
58
+ - - seed
59
+ - 381900577
60
+ - - storage
61
+ - null
62
+ - - study_name
63
+ - null
64
+ - - tensorboard_log
65
+ - ''
66
+ - - track
67
+ - false
68
+ - - trained_agent
69
+ - ''
70
+ - - trial_id
71
+ - null
72
+ - - truncate_last_trajectory
73
+ - true
74
+ - - uuid
75
+ - false
76
+ - - vec_env
77
+ - dummy
78
+ - - verbose
79
+ - 1
80
+ - - wandb_entity
81
+ - null
82
+ - - wandb_project_name
83
+ - sb3
84
+ - - wandb_tags
85
+ - []
config.yml ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!python/object/apply:collections.OrderedDict
2
+ - - - batch_size
3
+ - 32
4
+ - - buffer_size
5
+ - 100000
6
+ - - env_wrapper
7
+ - - stable_baselines3.common.atari_wrappers.AtariWrapper
8
+ - - exploration_final_eps
9
+ - 0.01
10
+ - - exploration_fraction
11
+ - 0.1
12
+ - - frame_stack
13
+ - 4
14
+ - - gradient_steps
15
+ - 1
16
+ - - learning_rate
17
+ - 0.0001
18
+ - - learning_starts
19
+ - 100000
20
+ - - n_timesteps
21
+ - 1000000.0
22
+ - - optimize_memory_usage
23
+ - false
24
+ - - policy
25
+ - CnnPolicy
26
+ - - target_update_interval
27
+ - 1000
28
+ - - train_freq
29
+ - 4
dqn-SpaceInvadersNoFrameskip-v4.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f19ccbc80cb66b553a31d2a9002979b6a26b042528dd4c29fbe45c099a632b7d
3
+ size 27219597
dqn-SpaceInvadersNoFrameskip-v4/_stable_baselines3_version ADDED
@@ -0,0 +1 @@
 
 
1
+ 2.7.0
dqn-SpaceInvadersNoFrameskip-v4/data ADDED
The diff for this file is too large to render. See raw diff
 
dqn-SpaceInvadersNoFrameskip-v4/policy.optimizer.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0af50457e5905a4c41e80a1f79a6714359b5a6a8928a5473cbeef81e03dec23c
3
+ size 13506569
dqn-SpaceInvadersNoFrameskip-v4/policy.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84dc331d0e18ccb562ff95fc23280a9bb6e7398b33cb0cc551bb53d1fd073dc5
3
+ size 13505767
dqn-SpaceInvadersNoFrameskip-v4/pytorch_variables.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07c7431cf6005e7d8f367d79e995f63e2f9b981a37e3437b795d058f9af4308b
3
+ size 1261
dqn-SpaceInvadersNoFrameskip-v4/system_info.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ - OS: Linux-6.6.105+-x86_64-with-glibc2.35 # 1 SMP Thu Oct 2 10:42:05 UTC 2025
2
+ - Python: 3.12.12
3
+ - Stable-Baselines3: 2.7.0
4
+ - PyTorch: 2.8.0+cu126
5
+ - GPU Enabled: True
6
+ - Numpy: 2.0.2
7
+ - Cloudpickle: 3.1.1
8
+ - Gymnasium: 1.2.1
9
+ - OpenAI Gym: 0.25.2
env_kwargs.yml ADDED
@@ -0,0 +1 @@
 
 
1
+ render_mode: rgb_array
results.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mean_reward": 713.5, "std_reward": 96.82587464102764, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2025-10-15T11:24:26.433149"}
train_eval_metrics.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d9cff95733983b1ee63368d93b9973880c7c408c0fc4bf47eadc8f6d95c6e3a1
3
+ size 36876