SB3 PPO. Vectorized 16 env. ~ 9_000_000 timesteps of training. mean_reward=163 +/- 103 . Training for an additional 50_000_000 timesteps resulted in a worse reward when evaluating
28a0b97
| library_name: stable-baselines3 | |
| tags: | |
| - Pixelcopter-PLE-v0 | |
| - deep-reinforcement-learning | |
| - reinforcement-learning | |
| - stable-baselines3 | |
| model-index: | |
| - name: ppo | |
| results: | |
| - task: | |
| type: reinforcement-learning | |
| name: reinforcement-learning | |
| dataset: | |
| name: Pixelcopter-PLE-v0 | |
| type: Pixelcopter-PLE-v0 | |
| metrics: | |
| - type: mean_reward | |
| value: 162.90 +/- 102.90 | |
| name: mean_reward | |
| verified: false | |
| # **ppo** Agent playing **Pixelcopter-PLE-v0** | |
| This is a trained model of a **ppo** agent playing **Pixelcopter-PLE-v0** | |
| using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). | |
| ## Usage (with Stable-baselines3) | |
| TODO: Add your code | |
| ```python | |
| from stable_baselines3 import ... | |
| from huggingface_sb3 import load_from_hub | |
| ... | |
| ``` | |