|
|
--- |
|
|
library_name: ml-agents |
|
|
tags: |
|
|
- SnowballTarget |
|
|
- deep-reinforcement-learning |
|
|
- reinforcement-learning |
|
|
- ML-Agents-SnowballTarget |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# **ppo** Agent playing **SnowballTarget** |
|
|
This is a trained model of a **ppo** agent playing **SnowballTarget** |
|
|
using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). |
|
|
|
|
|
## Usage (with ML-Agents) |
|
|
The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/ |
|
|
|
|
|
### Watch the Agent play |
|
|
You can watch the agent **playing directly in your browser** |
|
|
|
|
|
1. Go to https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget |
|
|
2. Step 1: Find the model_id: Francesco-A/ppo-SnowballTarget-v1 |
|
|
3. Step 2: Select the *.nn /*.onnx file |
|
|
4. Click on Watch the agent play |
|
|
|
|
|
## Training hyperparameters |
|
|
|
|
|
```python |
|
|
behaviors: |
|
|
SnowballTarget: |
|
|
trainer_type: ppo |
|
|
summary_freq: 10000 |
|
|
keep_checkpoints: 10 |
|
|
checkpoint_interval: 55000 |
|
|
max_steps: 250000 |
|
|
time_horizon: 64 |
|
|
threaded: true |
|
|
hyperparameters: |
|
|
learning_rate: 0.0003 |
|
|
learning_rate_schedule: linear |
|
|
batch_size: 128 |
|
|
buffer_size: 2048 |
|
|
beta: 0.005 |
|
|
epsilon: 0.2 |
|
|
lambd: 0.95 |
|
|
num_epoch: 3 |
|
|
network_settings: |
|
|
normalize: false |
|
|
hidden_units: 256 |
|
|
num_layers: 2 |
|
|
vis_encode_type: simple |
|
|
reward_signals: |
|
|
extrinsic: |
|
|
gamma: 0.99 |
|
|
strength: 1.0 |
|
|
``` |
|
|
|
|
|
## Training details |
|
|
|
|
|
| Step | Time Elapsed | Mean Reward | Std of Reward | Status | |
|
|
|---------|--------------|-------------|---------------|-----------| |
|
|
| 10000 | 29.079 s | 3.636 | 1.746 | Training | |
|
|
| 20000 | 55.042 s | 7.164 | 2.661 | Training | |
|
|
| 30000 | 77.884 s | 9.818 | 2.534 | Training | |
|
|
| 40000 | 103.229 s | 11.509 | 2.263 | Training | |
|
|
| 50000 | 127.046 s | 14.659 | 2.495 | Training | |
|
|
| 60000 | 150.811 s | 15.655 | 2.414 | Training | |
|
|
| 70000 | 174.292 s | 16.955 | 2.540 | Training | |
|
|
| 80000 | 198.938 s | 18.091 | 2.481 | Training | |
|
|
| 90000 | 221.915 s | 19.182 | 3.143 | Training | |
|
|
| 100000 | 246.203 s | 21.182 | 2.724 | Training | |
|
|
| 110000 | 271.024 s | 22.463 | 2.250 | Training | |
|
|
| 120000 | 292.551 s | 24.044 | 2.190 | Training | |
|
|
| 130000 | 317.539 s | 24.291 | 2.103 | Training | |
|
|
| 140000 | 340.057 s | 24.455 | 4.423 | Training | |
|
|
| 150000 | 366.645 s | 25.236 | 2.358 | Training | |
|
|
| 160000 | 390.192 s | 25.000 | 1.895 | Training | |
|
|
| 170000 | 414.326 s | 25.273 | 2.482 | Training | |
|
|
| 180000 | 438.103 s | 25.750 | 1.798 | Training | |
|
|
| 190000 | 462.837 s | 25.673 | 1.888 | Training | |
|
|
| 200000 | 485.258 s | 25.295 | 2.380 | Training | |
|
|
| 210000 | 509.542 s | 25.855 | 2.066 | Training | |
|
|
| 220000 | 535.202 s | 26.111 | 1.931 | Training | |
|
|
| 230000 | 556.965 s | 25.644 | 2.252 | Training | |
|
|
| 240000 | 582.135 s | 26.018 | 2.673 | Training | |
|
|
| 250000 | 604.248 s | 26.091 | 1.917 | Training | |
|
|
|