LuckLin commited on
Commit
d199eb0
·
verified ·
1 Parent(s): 155e1b6

Successfully uploaded with game replay video

Browse files
README.md CHANGED
@@ -16,72 +16,22 @@ model-index:
16
  type: SpaceInvadersNoFrameskip-v4
17
  metrics:
18
  - type: mean_reward
19
- value: 662.00 +/- 175.72
20
  name: mean_reward
21
  verified: false
22
  ---
23
 
24
  # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
25
  This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
26
- using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
- and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
 
29
- The RL Zoo is a training framework for Stable Baselines3
30
- reinforcement learning agents,
31
- with hyperparameter optimization and pre-trained agents included.
32
 
33
- ## Usage (with SB3 RL Zoo)
34
 
35
- RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
36
- SB3: https://github.com/DLR-RM/stable-baselines3<br/>
37
- SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
- SBX (SB3 + Jax): https://github.com/araffin/sbx
39
-
40
- Install the RL Zoo (with SB3 and SB3-Contrib):
41
- ```bash
42
- pip install rl_zoo3
43
- ```
44
-
45
- ```
46
- # Download model and save it into the logs/ folder
47
- python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
48
- python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
49
- ```
50
-
51
- If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
52
- ```
53
- python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
54
- python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
55
- ```
56
-
57
- ## Training (with the RL Zoo)
58
- ```
59
- python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
60
- # Upload the model and generate video (when possible)
61
- python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga LuckLin
62
- ```
63
-
64
- ## Hyperparameters
65
  ```python
66
- OrderedDict([('batch_size', 32),
67
- ('buffer_size', 100000),
68
- ('env_wrapper',
69
- ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
70
- ('exploration_final_eps', 0.01),
71
- ('exploration_fraction', 0.1),
72
- ('frame_stack', 4),
73
- ('gradient_steps', 1),
74
- ('learning_rate', 0.0001),
75
- ('learning_starts', 100000),
76
- ('n_timesteps', 1000000.0),
77
- ('optimize_memory_usage', False),
78
- ('policy', 'CnnPolicy'),
79
- ('target_update_interval', 1000),
80
- ('train_freq', 4),
81
- ('normalize', False)])
82
- ```
83
 
84
- # Environment Arguments
85
- ```python
86
- {'render_mode': 'rgb_array'}
87
  ```
 
16
  type: SpaceInvadersNoFrameskip-v4
17
  metrics:
18
  - type: mean_reward
19
+ value: 614.50 +/- 240.75
20
  name: mean_reward
21
  verified: false
22
  ---
23
 
24
  # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
25
  This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
 
27
 
28
+ ## Usage (with Stable-baselines3)
29
+ TODO: Add your code
 
30
 
 
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ```python
33
+ from stable_baselines3 import ...
34
+ from huggingface_sb3 import load_from_hub
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
+ ...
 
 
37
  ```
config.json ADDED
The diff for this file is too large to render. See raw diff
 
dqn-SpaceInvadersNoFrameskip-v4.zip CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:23dfc149be953d72b26aba273fbb8deda6d44f744d58b4e7c89e54e789b19e95
3
- size 27219601
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d135fbb0977f1bcf610b15e5c581f86c4f38e62131a09b7e72800cf152dc0e90
3
+ size 27219623
dqn-SpaceInvadersNoFrameskip-v4/data CHANGED
@@ -4,16 +4,16 @@
4
  ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
5
  "__module__": "stable_baselines3.dqn.policies",
6
  "__doc__": "\n Policy class for DQN when using images as input.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param features_extractor_class: Features extractor to use.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
- "__init__": "<function CnnPolicy.__init__ at 0x7c3d231d4680>",
8
  "__abstractmethods__": "frozenset()",
9
- "_abc_impl": "<_abc._abc_data object at 0x7c3d231d1300>"
10
  },
11
  "verbose": 1,
12
  "policy_kwargs": {},
13
  "num_timesteps": 1000000,
14
  "_total_timesteps": 1000000,
15
  "_num_timesteps_at_start": 0,
16
- "seed": 0,
17
  "action_noise": null,
18
  "start_time": 1767249084442169849,
19
  "learning_rate": {
@@ -64,7 +64,7 @@
64
  },
65
  "action_space": {
66
  ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
67
- ":serialized:": "gAWVtQIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihDjYZWmt15YCS1Fllk0taEajANpbmOUihCpc3hEvDOBWIIa9zrb2o1BdYwKaGFzX3VpbnQzMpRLAIwIdWludGVnZXKUSwB1jBpudW1weS5yYW5kb20uYml0X2dlbmVyYXRvcpSMG19fcHl4X3VucGlja2xlX1NlZWRTZXF1ZW5jZZSTlGgxjAxTZWVkU2VxdWVuY2WUk5RKIqLqA06HlFKUKEsASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAAAH60D+Njo2T50Ask6neprIlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
68
  "dtype": "int64",
69
  "n": "6",
70
  "start": "0",
@@ -72,7 +72,7 @@
72
  "_np_random": "Generator(PCG64)"
73
  },
74
  "n_envs": 1,
75
- "buffer_size": 1,
76
  "batch_size": 32,
77
  "learning_starts": 100000,
78
  "tau": 1.0,
@@ -85,13 +85,13 @@
85
  "__module__": "stable_baselines3.common.buffers",
86
  "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
87
  "__doc__": "\n Replay buffer used in off-policy algorithms like SAC/TD3.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param n_envs: Number of parallel environments\n :param optimize_memory_usage: Enable a memory efficient variant\n of the replay buffer which reduces by almost a factor two the memory used,\n at a cost of more complexity.\n See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n Cannot be used in combination with handle_timeout_termination.\n :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n separately and treat the task as infinite horizon task.\n https://github.com/DLR-RM/stable-baselines3/issues/284\n ",
88
- "__init__": "<function ReplayBuffer.__init__ at 0x7c3d2330e8e0>",
89
- "add": "<function ReplayBuffer.add at 0x7c3d2330ea20>",
90
- "sample": "<function ReplayBuffer.sample at 0x7c3d2330eac0>",
91
- "_get_samples": "<function ReplayBuffer._get_samples at 0x7c3d2330eb60>",
92
- "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7c3d2330ec00>)>",
93
  "__abstractmethods__": "frozenset()",
94
- "_abc_impl": "<_abc._abc_data object at 0x7c3d232d5980>"
95
  },
96
  "replay_buffer_kwargs": {},
97
  "n_steps": 1,
 
4
  ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
5
  "__module__": "stable_baselines3.dqn.policies",
6
  "__doc__": "\n Policy class for DQN when using images as input.\n\n :param observation_space: Observation space\n :param action_space: Action space\n :param lr_schedule: Learning rate schedule (could be constant)\n :param net_arch: The specification of the policy and value networks.\n :param activation_fn: Activation function\n :param features_extractor_class: Features extractor to use.\n :param normalize_images: Whether to normalize images or not,\n dividing by 255.0 (True by default)\n :param optimizer_class: The optimizer to use,\n ``th.optim.Adam`` by default\n :param optimizer_kwargs: Additional keyword arguments,\n excluding the learning rate, to pass to the optimizer\n ",
7
+ "__init__": "<function CnnPolicy.__init__ at 0x7c2cf03bce00>",
8
  "__abstractmethods__": "frozenset()",
9
+ "_abc_impl": "<_abc._abc_data object at 0x7c2cf03ad200>"
10
  },
11
  "verbose": 1,
12
  "policy_kwargs": {},
13
  "num_timesteps": 1000000,
14
  "_total_timesteps": 1000000,
15
  "_num_timesteps_at_start": 0,
16
+ "seed": 2476167096,
17
  "action_noise": null,
18
  "start_time": 1767249084442169849,
19
  "learning_rate": {
 
64
  },
65
  "action_space": {
66
  ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
67
+ ":serialized:": "gAWVuwIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihFXsBFGEbJJUSr3FOjcD639AIwDaW5jlIoQPfAfyf7mW+ycdVs0yO2oZ3WMCmhhc191aW50MzKUSwCMCHVpbnRlZ2VylEsAdYwabnVtcHkucmFuZG9tLmJpdF9nZW5lcmF0b3KUjBtfX3B5eF91bnBpY2tsZV9TZWVkU2VxdWVuY2WUk5RoMYwMU2VlZFNlcXVlbmNllJOUSiKi6gNOh5RSlCiKBbhPl5MASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAACGr/iyWSMNgDTmhsfrW4rJlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
68
  "dtype": "int64",
69
  "n": "6",
70
  "start": "0",
 
72
  "_np_random": "Generator(PCG64)"
73
  },
74
  "n_envs": 1,
75
+ "buffer_size": 100000,
76
  "batch_size": 32,
77
  "learning_starts": 100000,
78
  "tau": 1.0,
 
85
  "__module__": "stable_baselines3.common.buffers",
86
  "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
87
  "__doc__": "\n Replay buffer used in off-policy algorithms like SAC/TD3.\n\n :param buffer_size: Max number of element in the buffer\n :param observation_space: Observation space\n :param action_space: Action space\n :param device: PyTorch device\n :param n_envs: Number of parallel environments\n :param optimize_memory_usage: Enable a memory efficient variant\n of the replay buffer which reduces by almost a factor two the memory used,\n at a cost of more complexity.\n See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n Cannot be used in combination with handle_timeout_termination.\n :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n separately and treat the task as infinite horizon task.\n https://github.com/DLR-RM/stable-baselines3/issues/284\n ",
88
+ "__init__": "<function ReplayBuffer.__init__ at 0x7c2cf02ef1a0>",
89
+ "add": "<function ReplayBuffer.add at 0x7c2cf02ef2e0>",
90
+ "sample": "<function ReplayBuffer.sample at 0x7c2cf02ef380>",
91
+ "_get_samples": "<function ReplayBuffer._get_samples at 0x7c2cf02ef420>",
92
+ "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7c2cf02ef4c0>)>",
93
  "__abstractmethods__": "frozenset()",
94
+ "_abc_impl": "<_abc._abc_data object at 0x7c2cb92c7f40>"
95
  },
96
  "replay_buffer_kwargs": {},
97
  "n_steps": 1,
results.json CHANGED
@@ -1 +1 @@
1
- {"mean_reward": 662.0, "std_reward": 175.7156794369814, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-01T07:42:28.085142"}
 
1
+ {"mean_reward": 614.5, "std_reward": 240.75350464738827, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-01T08:01:54.625638"}