|
|
--- |
|
|
library_name: ml-agents |
|
|
tags: |
|
|
- Pyramids |
|
|
- deep-reinforcement-learning |
|
|
- reinforcement-learning |
|
|
- ML-Agents-Pyramids |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# **ppo** Agent playing **Pyramids** |
|
|
This is a trained model of a **ppo** agent playing **Pyramids** |
|
|
using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). |
|
|
|
|
|
## Watch the Agent play |
|
|
You can watch the agent playing directly in your browser |
|
|
|
|
|
Go to https://huggingface.co/spaces/unity/ML-Agents-Pyramids |
|
|
Step 1: Find the model_id: Francesco-A/ppo-Pyramids-v1 |
|
|
Step 2: Select the .nn /.onnx file |
|
|
Click on Watch the agent play |
|
|
|
|
|
### Resume the training |
|
|
```bash |
|
|
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume |
|
|
``` |
|
|
|
|
|
### Training hyperparameters |
|
|
```python |
|
|
behaviors: |
|
|
Pyramids: |
|
|
trainer_type: ppo |
|
|
hyperparameters: |
|
|
batch_size: 128 |
|
|
buffer_size: 2048 |
|
|
learning_rate: 0.0003 |
|
|
beta: 0.01 |
|
|
epsilon: 0.2 |
|
|
lambd: 0.95 |
|
|
num_epoch: 3 |
|
|
learning_rate_schedule: linear |
|
|
network_settings: |
|
|
normalize: false |
|
|
hidden_units: 512 |
|
|
num_layers: 2 |
|
|
vis_encode_type: simple |
|
|
reward_signals: |
|
|
extrinsic: |
|
|
gamma: 0.99 |
|
|
strength: 1.0 |
|
|
rnd: |
|
|
gamma: 0.99 |
|
|
strength: 0.01 |
|
|
network_settings: |
|
|
hidden_units: 64 |
|
|
num_layers: 3 |
|
|
learning_rate: 0.0001 |
|
|
keep_checkpoints: 5 |
|
|
max_steps: 1000000 |
|
|
time_horizon: 128 |
|
|
summary_freq: 30000 |
|
|
``` |
|
|
|
|
|
## Training details |
|
|
| Step | Time Elapsed | Mean Reward | Std of Reward | Status | |
|
|
|---------|--------------|-------------|---------------|-----------| |
|
|
| 30000 | 59.481 s | -1.000 | 0.000 | Training | |
|
|
| 60000 | 118.648 s | -0.798 | 0.661 | Training | |
|
|
| 90000 | 180.684 s | -0.701 | 0.808 | Training | |
|
|
| 120000 | 240.734 s | -0.931 | 0.373 | Training | |
|
|
| 150000 | 300.978 s | -0.851 | 0.588 | Training | |
|
|
| 180000 | 360.137 s | -0.934 | 0.361 | Training | |
|
|
| 210000 | 424.326 s | -1.000 | 0.000 | Training | |
|
|
| 240000 | 484.774 s | -0.849 | 0.595 | Training | |
|
|
| 270000 | 546.089 s | -0.377 | 1.029 | Training | |
|
|
| 300000 | 614.797 s | -0.735 | 0.689 | Training | |
|
|
| 330000 | 684.241 s | -0.926 | 0.405 | Training | |
|
|
| 360000 | 745.790 s | -0.819 | 0.676 | Training | |
|
|
| 390000 | 812.573 s | -0.715 | 0.755 | Training | |
|
|
| 420000 | 877.836 s | -0.781 | 0.683 | Training | |
|
|
| 450000 | 944.423 s | -0.220 | 1.114 | Training | |
|
|
| 480000 | 1010.918 s | -0.484 | 0.962 | Training | |
|
|
| 510000 | 1074.058 s | -0.003 | 1.162 | Training | |
|
|
| 540000 | 1138.848 s | -0.021 | 1.222 | Training | |
|
|
| 570000 | 1204.326 s | 0.384 | 1.231 | Training | |
|
|
| 600000 | 1276.488 s | 0.690 | 1.174 | Training | |
|
|
| 630000 | 1345.297 s | 0.943 | 1.058 | Training | |
|
|
| 660000 | 1412.791 s | 1.014 | 1.043 | Training | |
|
|
| 690000 | 1482.712 s | 0.927 | 1.054 | Training | |
|
|
| 720000 | 1548.726 s | 0.900 | 1.128 | Training | |
|
|
| 750000 | 1618.284 s | 1.379 | 0.701 | Training | |
|
|
| 780000 | 1692.080 s | 1.567 | 0.359 | Training | |
|
|
| 810000 | 1762.159 s | 1.475 | 0.567 | Training | |
|
|
| 840000 | 1832.166 s | 1.438 | 0.648 | Training | |
|
|
| 870000 | 1907.191 s | 1.534 | 0.536 | Training | |
|
|
| 900000 | 1977.521 s | 1.552 | 0.478 | Training | |
|
|
| 930000 | 2051.259 s | 1.458 | 0.633 | Training | |
|
|
| 960000 | 2126.498 s | 1.545 | 0.586 | Training | |
|
|
| 990000 | 2198.591 s | 1.565 | 0.591 | Training | |