Initial commit

Browse files

Files changed (4) hide show

README.md +57 -7
dqn-SpaceInvadersNoFrameskip-v4.zip +2 -2
dqn-SpaceInvadersNoFrameskip-v4/data +11 -11
results.json +1 -1

README.md CHANGED Viewed

@@ -16,22 +16,72 @@ model-index:
       type: SpaceInvadersNoFrameskip-v4
     metrics:
     - type: mean_reward
-      value: 614.50 +/- 240.75
       name: mean_reward
       verified: false
 ---
 # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
 This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
-using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
-## Usage (with Stable-baselines3)
-TODO: Add your code
 ```python
-from stable_baselines3 import ...
-from huggingface_sb3 import load_from_hub
-...
 ```

       type: SpaceInvadersNoFrameskip-v4
     metrics:
     - type: mean_reward
+      value: 662.00 +/- 175.72
       name: mean_reward
       verified: false
 ---
 # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
 This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
+using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
+and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
+The RL Zoo is a training framework for Stable Baselines3
+reinforcement learning agents,
+with hyperparameter optimization and pre-trained agents included.
+## Usage (with SB3 RL Zoo)
+RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
+SB3: https://github.com/DLR-RM/stable-baselines3<br/>
+SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
+SBX (SB3 + Jax): https://github.com/araffin/sbx
+Install the RL Zoo (with SB3 and SB3-Contrib):
+```bash
+pip install rl_zoo3
+```
+```
+# Download model and save it into the logs/ folder
+python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
+python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4  -f logs/
+```
+If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
+```
+python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
+python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4  -f logs/
+```
+## Training (with the RL Zoo)
+```
+python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
+# Upload the model and generate video (when possible)
+python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga LuckLin
+```
+## Hyperparameters
 ```python
+OrderedDict([('batch_size', 32),
+             ('buffer_size', 100000),
+             ('env_wrapper',
+              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
+             ('exploration_final_eps', 0.01),
+             ('exploration_fraction', 0.1),
+             ('frame_stack', 4),
+             ('gradient_steps', 1),
+             ('learning_rate', 0.0001),
+             ('learning_starts', 100000),
+             ('n_timesteps', 1000000.0),
+             ('optimize_memory_usage', False),
+             ('policy', 'CnnPolicy'),
+             ('target_update_interval', 1000),
+             ('train_freq', 4),
+             ('normalize', False)])
+```
+# Environment Arguments
+```python
+{'render_mode': 'rgb_array'}
 ```

dqn-SpaceInvadersNoFrameskip-v4.zip CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d135fbb0977f1bcf610b15e5c581f86c4f38e62131a09b7e72800cf152dc0e90
-size 27219623

 version https://git-lfs.github.com/spec/v1
+oid sha256:bc28ccf17017ae83d337c8fae0722e2999f2b5030dae07a419addda05852f22f
+size 27219601

dqn-SpaceInvadersNoFrameskip-v4/data CHANGED Viewed

@@ -4,16 +4,16 @@
         ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
         "__module__": "stable_baselines3.dqn.policies",
         "__doc__": "\n    Policy class for DQN when using images as input.\n\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param lr_schedule: Learning rate schedule (could be constant)\n    :param net_arch: The specification of the policy and value networks.\n    :param activation_fn: Activation function\n    :param features_extractor_class: Features extractor to use.\n    :param normalize_images: Whether to normalize images or not,\n         dividing by 255.0 (True by default)\n    :param optimizer_class: The optimizer to use,\n        ``th.optim.Adam`` by default\n    :param optimizer_kwargs: Additional keyword arguments,\n        excluding the learning rate, to pass to the optimizer\n    ",
-        "__init__": "<function CnnPolicy.__init__ at 0x7c2cf03bce00>",
         "__abstractmethods__": "frozenset()",
-        "_abc_impl": "<_abc._abc_data object at 0x7c2cf03ad200>"
     },
     "verbose": 1,
     "policy_kwargs": {},
     "num_timesteps": 1000000,
     "_total_timesteps": 1000000,
     "_num_timesteps_at_start": 0,
-    "seed": 2476167096,
     "action_noise": null,
     "start_time": 1767249084442169849,
     "learning_rate": {
@@ -64,7 +64,7 @@
     },
     "action_space": {
         ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
-        ":serialized:": "gAWVuwIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihFXsBFGEbJJUSr3FOjcD639AIwDaW5jlIoQPfAfyf7mW+ycdVs0yO2oZ3WMCmhhc191aW50MzKUSwCMCHVpbnRlZ2VylEsAdYwabnVtcHkucmFuZG9tLmJpdF9nZW5lcmF0b3KUjBtfX3B5eF91bnBpY2tsZV9TZWVkU2VxdWVuY2WUk5RoMYwMU2VlZFNlcXVlbmNllJOUSiKi6gNOh5RSlCiKBbhPl5MASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAACGr/iyWSMNgDTmhsfrW4rJlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
         "dtype": "int64",
         "n": "6",
         "start": "0",
@@ -72,7 +72,7 @@
         "_np_random": "Generator(PCG64)"
     },
     "n_envs": 1,
-    "buffer_size": 100000,
     "batch_size": 32,
     "learning_starts": 100000,
     "tau": 1.0,
@@ -85,13 +85,13 @@
         "__module__": "stable_baselines3.common.buffers",
         "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
         "__doc__": "\n    Replay buffer used in off-policy algorithms like SAC/TD3.\n\n    :param buffer_size: Max number of element in the buffer\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param device: PyTorch device\n    :param n_envs: Number of parallel environments\n    :param optimize_memory_usage: Enable a memory efficient variant\n        of the replay buffer which reduces by almost a factor two the memory used,\n        at a cost of more complexity.\n        See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n        and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n        Cannot be used in combination with handle_timeout_termination.\n    :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n        separately and treat the task as infinite horizon task.\n        https://github.com/DLR-RM/stable-baselines3/issues/284\n    ",
-        "__init__": "<function ReplayBuffer.__init__ at 0x7c2cf02ef1a0>",
-        "add": "<function ReplayBuffer.add at 0x7c2cf02ef2e0>",
-        "sample": "<function ReplayBuffer.sample at 0x7c2cf02ef380>",
-        "_get_samples": "<function ReplayBuffer._get_samples at 0x7c2cf02ef420>",
-        "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7c2cf02ef4c0>)>",
         "__abstractmethods__": "frozenset()",
-        "_abc_impl": "<_abc._abc_data object at 0x7c2cb92c7f40>"
     },
     "replay_buffer_kwargs": {},
     "n_steps": 1,

         ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
         "__module__": "stable_baselines3.dqn.policies",
         "__doc__": "\n    Policy class for DQN when using images as input.\n\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param lr_schedule: Learning rate schedule (could be constant)\n    :param net_arch: The specification of the policy and value networks.\n    :param activation_fn: Activation function\n    :param features_extractor_class: Features extractor to use.\n    :param normalize_images: Whether to normalize images or not,\n         dividing by 255.0 (True by default)\n    :param optimizer_class: The optimizer to use,\n        ``th.optim.Adam`` by default\n    :param optimizer_kwargs: Additional keyword arguments,\n        excluding the learning rate, to pass to the optimizer\n    ",
+        "__init__": "<function CnnPolicy.__init__ at 0x7907df8cc680>",
         "__abstractmethods__": "frozenset()",
+        "_abc_impl": "<_abc._abc_data object at 0x7907df8c9080>"
     },
     "verbose": 1,
     "policy_kwargs": {},
     "num_timesteps": 1000000,
     "_total_timesteps": 1000000,
     "_num_timesteps_at_start": 0,
+    "seed": 0,
     "action_noise": null,
     "start_time": 1767249084442169849,
     "learning_rate": {
     },
     "action_space": {
         ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
+        ":serialized:": "gAWVtQIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihDjYZWmt15YCS1Fllk0taEajANpbmOUihCpc3hEvDOBWIIa9zrb2o1BdYwKaGFzX3VpbnQzMpRLAIwIdWludGVnZXKUSwB1jBpudW1weS5yYW5kb20uYml0X2dlbmVyYXRvcpSMG19fcHl4X3VucGlja2xlX1NlZWRTZXF1ZW5jZZSTlGgxjAxTZWVkU2VxdWVuY2WUk5RKIqLqA06HlFKUKEsASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAAAH60D+Njo2T50Ask6neprIlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
         "dtype": "int64",
         "n": "6",
         "start": "0",
         "_np_random": "Generator(PCG64)"
     },
     "n_envs": 1,
+    "buffer_size": 1,
     "batch_size": 32,
     "learning_starts": 100000,
     "tau": 1.0,
         "__module__": "stable_baselines3.common.buffers",
         "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
         "__doc__": "\n    Replay buffer used in off-policy algorithms like SAC/TD3.\n\n    :param buffer_size: Max number of element in the buffer\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param device: PyTorch device\n    :param n_envs: Number of parallel environments\n    :param optimize_memory_usage: Enable a memory efficient variant\n        of the replay buffer which reduces by almost a factor two the memory used,\n        at a cost of more complexity.\n        See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n        and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n        Cannot be used in combination with handle_timeout_termination.\n    :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n        separately and treat the task as infinite horizon task.\n        https://github.com/DLR-RM/stable-baselines3/issues/284\n    ",
+        "__init__": "<function ReplayBuffer.__init__ at 0x7907dfa068e0>",
+        "add": "<function ReplayBuffer.add at 0x7907dfa06a20>",
+        "sample": "<function ReplayBuffer.sample at 0x7907dfa06ac0>",
+        "_get_samples": "<function ReplayBuffer._get_samples at 0x7907dfa06b60>",
+        "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7907dfa06c00>)>",
         "__abstractmethods__": "frozenset()",
+        "_abc_impl": "<_abc._abc_data object at 0x7907df9c1600>"
     },
     "replay_buffer_kwargs": {},
     "n_steps": 1,

results.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"mean_reward": ~~614~~.5, "std_reward": ~~240~~.~~75350464738827~~, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-01T08:01:54.~~625638~~"}


1	+ {"mean_reward": 662.0, "std_reward": 175.7156794369814, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-01T08:16:52.395079"}