Successfully uploaded with game replay video

Browse files

Files changed (5) hide show

README.md +7 -57
config.json +0 -0
dqn-SpaceInvadersNoFrameskip-v4.zip +2 -2
dqn-SpaceInvadersNoFrameskip-v4/data +11 -11
results.json +1 -1

README.md CHANGED Viewed

@@ -16,72 +16,22 @@ model-index:
       type: SpaceInvadersNoFrameskip-v4
     metrics:
     - type: mean_reward
-      value: 662.00 +/- 175.72
       name: mean_reward
       verified: false
 ---
 # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
 This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
-using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
-and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
-The RL Zoo is a training framework for Stable Baselines3
-reinforcement learning agents,
-with hyperparameter optimization and pre-trained agents included.
-## Usage (with SB3 RL Zoo)
-RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
-SB3: https://github.com/DLR-RM/stable-baselines3<br/>
-SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
-SBX (SB3 + Jax): https://github.com/araffin/sbx
-Install the RL Zoo (with SB3 and SB3-Contrib):
-```bash
-pip install rl_zoo3
-```
-```
-# Download model and save it into the logs/ folder
-python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
-python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4  -f logs/
-```
-If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
-```
-python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga LuckLin -f logs/
-python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4  -f logs/
-```
-## Training (with the RL Zoo)
-```
-python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
-# Upload the model and generate video (when possible)
-python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga LuckLin
-```
-## Hyperparameters
 ```python
-OrderedDict([('batch_size', 32),
-             ('buffer_size', 100000),
-             ('env_wrapper',
-              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
-             ('exploration_final_eps', 0.01),
-             ('exploration_fraction', 0.1),
-             ('frame_stack', 4),
-             ('gradient_steps', 1),
-             ('learning_rate', 0.0001),
-             ('learning_starts', 100000),
-             ('n_timesteps', 1000000.0),
-             ('optimize_memory_usage', False),
-             ('policy', 'CnnPolicy'),
-             ('target_update_interval', 1000),
-             ('train_freq', 4),
-             ('normalize', False)])
-```
-# Environment Arguments
-```python
-{'render_mode': 'rgb_array'}
 ```

       type: SpaceInvadersNoFrameskip-v4
     metrics:
     - type: mean_reward
+      value: 614.50 +/- 240.75
       name: mean_reward
       verified: false
 ---
 # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
 This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
+using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
+## Usage (with Stable-baselines3)
+TODO: Add your code
 ```python
+from stable_baselines3 import ...
+from huggingface_sb3 import load_from_hub
+...
 ```

config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

dqn-SpaceInvadersNoFrameskip-v4.zip CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:23dfc149be953d72b26aba273fbb8deda6d44f744d58b4e7c89e54e789b19e95
-size 27219601

 version https://git-lfs.github.com/spec/v1
+oid sha256:d135fbb0977f1bcf610b15e5c581f86c4f38e62131a09b7e72800cf152dc0e90
+size 27219623

dqn-SpaceInvadersNoFrameskip-v4/data CHANGED Viewed

@@ -4,16 +4,16 @@
         ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
         "__module__": "stable_baselines3.dqn.policies",
         "__doc__": "\n    Policy class for DQN when using images as input.\n\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param lr_schedule: Learning rate schedule (could be constant)\n    :param net_arch: The specification of the policy and value networks.\n    :param activation_fn: Activation function\n    :param features_extractor_class: Features extractor to use.\n    :param normalize_images: Whether to normalize images or not,\n         dividing by 255.0 (True by default)\n    :param optimizer_class: The optimizer to use,\n        ``th.optim.Adam`` by default\n    :param optimizer_kwargs: Additional keyword arguments,\n        excluding the learning rate, to pass to the optimizer\n    ",
-        "__init__": "<function CnnPolicy.__init__ at 0x7c3d231d4680>",
         "__abstractmethods__": "frozenset()",
-        "_abc_impl": "<_abc._abc_data object at 0x7c3d231d1300>"
     },
     "verbose": 1,
     "policy_kwargs": {},
     "num_timesteps": 1000000,
     "_total_timesteps": 1000000,
     "_num_timesteps_at_start": 0,
-    "seed": 0,
     "action_noise": null,
     "start_time": 1767249084442169849,
     "learning_rate": {
@@ -64,7 +64,7 @@
     },
     "action_space": {
         ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
-        ":serialized:": "gAWVtQIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihDjYZWmt15YCS1Fllk0taEajANpbmOUihCpc3hEvDOBWIIa9zrb2o1BdYwKaGFzX3VpbnQzMpRLAIwIdWludGVnZXKUSwB1jBpudW1weS5yYW5kb20uYml0X2dlbmVyYXRvcpSMG19fcHl4X3VucGlja2xlX1NlZWRTZXF1ZW5jZZSTlGgxjAxTZWVkU2VxdWVuY2WUk5RKIqLqA06HlFKUKEsASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAAAH60D+Njo2T50Ask6neprIlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
         "dtype": "int64",
         "n": "6",
         "start": "0",
@@ -72,7 +72,7 @@
         "_np_random": "Generator(PCG64)"
     },
     "n_envs": 1,
-    "buffer_size": 1,
     "batch_size": 32,
     "learning_starts": 100000,
     "tau": 1.0,
@@ -85,13 +85,13 @@
         "__module__": "stable_baselines3.common.buffers",
         "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
         "__doc__": "\n    Replay buffer used in off-policy algorithms like SAC/TD3.\n\n    :param buffer_size: Max number of element in the buffer\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param device: PyTorch device\n    :param n_envs: Number of parallel environments\n    :param optimize_memory_usage: Enable a memory efficient variant\n        of the replay buffer which reduces by almost a factor two the memory used,\n        at a cost of more complexity.\n        See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n        and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n        Cannot be used in combination with handle_timeout_termination.\n    :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n        separately and treat the task as infinite horizon task.\n        https://github.com/DLR-RM/stable-baselines3/issues/284\n    ",
-        "__init__": "<function ReplayBuffer.__init__ at 0x7c3d2330e8e0>",
-        "add": "<function ReplayBuffer.add at 0x7c3d2330ea20>",
-        "sample": "<function ReplayBuffer.sample at 0x7c3d2330eac0>",
-        "_get_samples": "<function ReplayBuffer._get_samples at 0x7c3d2330eb60>",
-        "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7c3d2330ec00>)>",
         "__abstractmethods__": "frozenset()",
-        "_abc_impl": "<_abc._abc_data object at 0x7c3d232d5980>"
     },
     "replay_buffer_kwargs": {},
     "n_steps": 1,

         ":serialized:": "gAWVMAAAAAAAAACMHnN0YWJsZV9iYXNlbGluZXMzLmRxbi5wb2xpY2llc5SMCUNublBvbGljeZSTlC4=",
         "__module__": "stable_baselines3.dqn.policies",
         "__doc__": "\n    Policy class for DQN when using images as input.\n\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param lr_schedule: Learning rate schedule (could be constant)\n    :param net_arch: The specification of the policy and value networks.\n    :param activation_fn: Activation function\n    :param features_extractor_class: Features extractor to use.\n    :param normalize_images: Whether to normalize images or not,\n         dividing by 255.0 (True by default)\n    :param optimizer_class: The optimizer to use,\n        ``th.optim.Adam`` by default\n    :param optimizer_kwargs: Additional keyword arguments,\n        excluding the learning rate, to pass to the optimizer\n    ",
+        "__init__": "<function CnnPolicy.__init__ at 0x7c2cf03bce00>",
         "__abstractmethods__": "frozenset()",
+        "_abc_impl": "<_abc._abc_data object at 0x7c2cf03ad200>"
     },
     "verbose": 1,
     "policy_kwargs": {},
     "num_timesteps": 1000000,
     "_total_timesteps": 1000000,
     "_num_timesteps_at_start": 0,
+    "seed": 2476167096,
     "action_noise": null,
     "start_time": 1767249084442169849,
     "learning_rate": {
     },
     "action_space": {
         ":type:": "<class 'gymnasium.spaces.discrete.Discrete'>",
+        ":serialized:": "gAWVuwIAAAAAAACMGWd5bW5hc2l1bS5zcGFjZXMuZGlzY3JldGWUjAhEaXNjcmV0ZZSTlCmBlH2UKIwFZHR5cGWUjAVudW1weZSMBWR0eXBllJOUjAJpOJSJiIeUUpQoSwOMATyUTk5OSv////9K/////0sAdJRijAFulIwWbnVtcHkuX2NvcmUubXVsdGlhcnJheZSMBnNjYWxhcpSTlGgIjAJpOJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYkMIBgAAAAAAAACUhpRSlIwFc3RhcnSUaBFoFEMIAAAAAAAAAACUhpRSlIwGX3NoYXBllCmMCl9ucF9yYW5kb22UjBRudW1weS5yYW5kb20uX3BpY2tsZZSMEF9fZ2VuZXJhdG9yX2N0b3KUk5RoH4wUX19iaXRfZ2VuZXJhdG9yX2N0b3KUk5SME251bXB5LnJhbmRvbS5fcGNnNjSUjAVQQ0c2NJSTlIWUUpR9lCiMDWJpdF9nZW5lcmF0b3KUjAVQQ0c2NJSMBXN0YXRllH2UKGgsihFXsBFGEbJJUSr3FOjcD639AIwDaW5jlIoQPfAfyf7mW+ycdVs0yO2oZ3WMCmhhc191aW50MzKUSwCMCHVpbnRlZ2VylEsAdYwabnVtcHkucmFuZG9tLmJpdF9nZW5lcmF0b3KUjBtfX3B5eF91bnBpY2tsZV9TZWVkU2VxdWVuY2WUk5RoMYwMU2VlZFNlcXVlbmNllJOUSiKi6gNOh5RSlCiKBbhPl5MASwCME251bXB5Ll9jb3JlLm51bWVyaWOUjAtfZnJvbWJ1ZmZlcpSTlCiWEAAAAAAAAACGr/iyWSMNgDTmhsfrW4rJlGgIjAJ1NJSJiIeUUpQoSwNoDE5OTkr/////Sv////9LAHSUYksEhZSMAUOUdJRSlEsEKXSUYoaUYoWUUpR1Yi4=",
         "dtype": "int64",
         "n": "6",
         "start": "0",
         "_np_random": "Generator(PCG64)"
     },
     "n_envs": 1,
+    "buffer_size": 100000,
     "batch_size": 32,
     "learning_starts": 100000,
     "tau": 1.0,
         "__module__": "stable_baselines3.common.buffers",
         "__annotations__": "{'observations': <class 'numpy.ndarray'>, 'next_observations': <class 'numpy.ndarray'>, 'actions': <class 'numpy.ndarray'>, 'rewards': <class 'numpy.ndarray'>, 'dones': <class 'numpy.ndarray'>, 'timeouts': <class 'numpy.ndarray'>}",
         "__doc__": "\n    Replay buffer used in off-policy algorithms like SAC/TD3.\n\n    :param buffer_size: Max number of element in the buffer\n    :param observation_space: Observation space\n    :param action_space: Action space\n    :param device: PyTorch device\n    :param n_envs: Number of parallel environments\n    :param optimize_memory_usage: Enable a memory efficient variant\n        of the replay buffer which reduces by almost a factor two the memory used,\n        at a cost of more complexity.\n        See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195\n        and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274\n        Cannot be used in combination with handle_timeout_termination.\n    :param handle_timeout_termination: Handle timeout termination (due to timelimit)\n        separately and treat the task as infinite horizon task.\n        https://github.com/DLR-RM/stable-baselines3/issues/284\n    ",
+        "__init__": "<function ReplayBuffer.__init__ at 0x7c2cf02ef1a0>",
+        "add": "<function ReplayBuffer.add at 0x7c2cf02ef2e0>",
+        "sample": "<function ReplayBuffer.sample at 0x7c2cf02ef380>",
+        "_get_samples": "<function ReplayBuffer._get_samples at 0x7c2cf02ef420>",
+        "_maybe_cast_dtype": "<staticmethod(<function ReplayBuffer._maybe_cast_dtype at 0x7c2cf02ef4c0>)>",
         "__abstractmethods__": "frozenset()",
+        "_abc_impl": "<_abc._abc_data object at 0x7c2cb92c7f40>"
     },
     "replay_buffer_kwargs": {},
     "n_steps": 1,

results.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"mean_reward": ~~662~~.0, "std_reward": ~~175~~.~~7156794369814~~, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-~~01T07~~:42:28.~~085142~~"}


1	+ {"mean_reward": 614.5, "std_reward": 240.75350464738827, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2026-01-01T08:01:54.625638"}