LuckLin commited on
Commit
1ace38f
·
verified ·
1 Parent(s): dd22249

Initial commit

Browse files
README.md CHANGED
@@ -16,22 +16,72 @@ model-index:
16
  type: SpaceInvadersNoFrameskip-v4
17
  metrics:
18
  - type: mean_reward
19
- value: 614.50 +/- 240.75
20
  name: mean_reward
21
  verified: false
22
  ---
23
 
24
  # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
25
  This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
26
- using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
 
27
 
28
- ## Usage (with Stable-baselines3)
29
- TODO: Add your code
 
30
 
 
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ```python
33
- from stable_baselines3 import ...
34
- from huggingface_sb3 import load_from_hub
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ...
 
 
37
  ```
 
16
  type: SpaceInvadersNoFrameskip-v4
17
  metrics:
18
  - type: mean_reward
19
+ value: 662.00 +/- 175.72
20
  name: mean_reward
21
  verified: false
22
  ---
23
 
24
  # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
25
  This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
+ and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
 
29
+ The RL Zoo is a training framework for Stable Baselines3
30
+ reinforcement learning agents,
31
+ with hyperparameter optimization and pre-trained agents included.
32
 
33
+ ## Usage (with SB3 RL Zoo)
34
 
35
+ RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
36
+ SB3: https://github.com/DLR-RM/stable-baselines3<br/>
37
+ SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
+ SBX (SB3 + Jax): https://github.com/araffin/sbx
39
+
40
+ Install the RL Zoo (with SB3 and SB3-Contrib):
41
+ ```bash
42
+ pip install rl_zoo3
43
+ ```
44
+
45
+ ```
46
+ # Download model and save it into the logs/ folder
47
+ python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
48
+ python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
49
+ ```
50
+
51
+ If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
52
+ ```
53
+ python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
54
+ python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
55
+ ```
56
+
57
+ ## Training (with the RL Zoo)
58
+ ```
59
+ python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
60
+ # Upload the model and generate video (when possible)
61
+ python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga LuckLin
62
+ ```
63
+
64
+ ## Hyperparameters
65
  ```python
66
+ OrderedDict([('batch_size', 32),
67
+ ('buffer_size', 100000),
68
+ ('env_wrapper',
69
+ ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
70
+ ('exploration_final_eps', 0.01),
71
+ ('exploration_fraction', 0.1),
72
+ ('frame_stack', 4),
73
+ ('gradient_steps', 1),
74
+ ('learning_rate', 0.0001),
75
+ ('learning_starts', 100000),
76
+ ('n_timesteps', 1000000.0),
77
+ ('optimize_memory_usage', False),
78
+ ('policy', 'CnnPolicy'),
79
+ ('target_update_interval', 1000),
80
+ ('train_freq', 4),
81
+ ('normalize', False)])
82
+ ```
83
 
84
+ # Environment Arguments
85
+ ```python
86
+ {'render_mode': 'rgb_array'}
87
  ```
dqn-SpaceInvadersNoFrameskip-v4.zip CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d135fbb0977f1bcf610b15e5c581f86c4f38e62131a09b7e72800cf152dc0e90
3
- size 27219623
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc28ccf17017ae83d337c8fae0722e2999f2b5030dae07a419addda05852f22f
3
+ size 27219601
dqn-SpaceInvadersNoFrameskip-v4/data CHANGED
@@ -4,16 +4,16 @@
4
  ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
5
  "__module__": "stable_baselines3.dqn.policies",
6
  "__doc__": "\n Policy class for DQN when using images as input.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param features_extractor_class: Features extractor to use.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
- "__init__": "<function CnnPolicy.__init__ at 0x7c2cf03bce00>",
8
  "__abstractmethods__": "frozenset()",
9
- "_abc_impl": "<_abc._abc_data object at 0x7c2cf03ad200>"
10
  },
11
  "verbose": 1,
12
  "policy_kwargs": {},
13
  "num_timesteps": 1000000,
14
  "_total_timesteps": 1000000,
15
  "_num_timesteps_at_start": 0,
16
- "seed": 2476167096,
17
  "action_noise": null,
18
  "start_time": 1767249084442169849,
19
  "learning_rate": {
@@ -64,7 +64,7 @@
64
  },
65
  "action_space": {
66
  ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
67
- ":serialized:": "gAWVuwIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihFXsBFGEbJJUSr3FOjcD639AIwDaW5jlIoQPfAfyf7mW+ycdVs0yO2oZ3WMCmhhc191aW50MzKUSwCMCHVpbnRlZ2VylEsAdYwabnVtcHkucmFuZG9tLmJpdF9nZW5lcmF0b3KUjBtfX3B5eF91bnBpY2tsZV9TZWVkU2VxdWVuY2WUk5RoMYwMU2VlZFNlcXVlbmNllJOUSiKi6gNOh5RSlCiKBbhPl5MASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAACGr/iyWSMNgDTmhsfrW4rJlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
68
  "dtype": "int64",
69
  "n": "6",
70
  "start": "0",
@@ -72,7 +72,7 @@
72
  "_np_random": "Generator(PCG64)"
73
  },
74
  "n_envs": 1,
75
- "buffer_size": 100000,
76
  "batch_size": 32,
77
  "learning_starts": 100000,
78
  "tau": 1.0,
@@ -85,13 +85,13 @@
85
  "__module__": "stable_baselines3.common.buffers",
86
  "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
87
  "__doc__": "\n Replay buffer used in off-policy algorithms like SAC/TD3.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param n_envs: Number of parallel environments\n :param optimize_memory_usage: Enable a memory efficient variant\n of the replay buffer which reduces by almost a factor two the memory used,\n at a cost of more complexity.\n See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n Cannot be used in combination with handle_timeout_termination.\n :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n separately and treat the task as infinite horizon task.\n https://github.com/DLR-RM/stable-baselines3/issues/284\n ",
88
- "__init__": "<function ReplayBuffer.__init__ at 0x7c2cf02ef1a0>",
89
- "add": "<function ReplayBuffer.add at 0x7c2cf02ef2e0>",
90
- "sample": "<function ReplayBuffer.sample at 0x7c2cf02ef380>",
91
- "_get_samples": "<function ReplayBuffer._get_samples at 0x7c2cf02ef420>",
92
- "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7c2cf02ef4c0>)>",
93
  "__abstractmethods__": "frozenset()",
94
- "_abc_impl": "<_abc._abc_data object at 0x7c2cb92c7f40>"
95
  },
96
  "replay_buffer_kwargs": {},
97
  "n_steps": 1,
 
4
  ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
5
  "__module__": "stable_baselines3.dqn.policies",
6
  "__doc__": "\n Policy class for DQN when using images as input.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param features_extractor_class: Features extractor to use.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
+ "__init__": "<function CnnPolicy.__init__ at 0x7907df8cc680>",
8
  "__abstractmethods__": "frozenset()",
9
+ "_abc_impl": "<_abc._abc_data object at 0x7907df8c9080>"
10
  },
11
  "verbose": 1,
12
  "policy_kwargs": {},
13
  "num_timesteps": 1000000,
14
  "_total_timesteps": 1000000,
15
  "_num_timesteps_at_start": 0,
16
+ "seed": 0,
17
  "action_noise": null,
18
  "start_time": 1767249084442169849,
19
  "learning_rate": {
 
64
  },
65
  "action_space": {
66
  ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
67
+ ":serialized:": "gAWVtQIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihDjYZWmt15YCS1Fllk0taEajANpbmOUihCpc3hEvDOBWIIa9zrb2o1BdYwKaGFzX3VpbnQzMpRLAIwIdWludGVnZXKUSwB1jBpudW1weS5yYW5kb20uYml0X2dlbmVyYXRvcpSMG19fcHl4X3VucGlja2xlX1NlZWRTZXF1ZW5jZZSTlGgxjAxTZWVkU2VxdWVuY2WUk5RKIqLqA06HlFKUKEsASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAAAH60D+Njo2T50Ask6neprIlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
68
  "dtype": "int64",
69
  "n": "6",
70
  "start": "0",
 
72
  "_np_random": "Generator(PCG64)"
73
  },
74
  "n_envs": 1,
75
+ "buffer_size": 1,
76
  "batch_size": 32,
77
  "learning_starts": 100000,
78
  "tau": 1.0,
 
85
  "__module__": "stable_baselines3.common.buffers",
86
  "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
87
  "__doc__": "\n Replay buffer used in off-policy algorithms like SAC/TD3.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param n_envs: Number of parallel environments\n :param optimize_memory_usage: Enable a memory efficient variant\n of the replay buffer which reduces by almost a factor two the memory used,\n at a cost of more complexity.\n See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n Cannot be used in combination with handle_timeout_termination.\n :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n separately and treat the task as infinite horizon task.\n https://github.com/DLR-RM/stable-baselines3/issues/284\n ",
88
+ "__init__": "<function ReplayBuffer.__init__ at 0x7907dfa068e0>",
89
+ "add": "<function ReplayBuffer.add at 0x7907dfa06a20>",
90
+ "sample": "<function ReplayBuffer.sample at 0x7907dfa06ac0>",
91
+ "_get_samples": "<function ReplayBuffer._get_samples at 0x7907dfa06b60>",
92
+ "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7907dfa06c00>)>",
93
  "__abstractmethods__": "frozenset()",
94
+ "_abc_impl": "<_abc._abc_data object at 0x7907df9c1600>"
95
  },
96
  "replay_buffer_kwargs": {},
97
  "n_steps": 1,
results.json CHANGED
@@ -1 +1 @@
1
- {"mean_reward": 614.5, "std_reward": 240.75350464738827, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-01T08:01:54.625638"}
 
1
+ {"mean_reward": 662.0, "std_reward": 175.7156794369814, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-01T08:16:52.395079"}