File size: 3,463 Bytes
5e88248
 
 
 
 
 
 
9ca9abb
5e88248
 
4e5f2ae
 
5e88248
 
 
 
 
 
 
8b39643
 
 
 
 
 
 
 
9ca9abb
 
 
 
43b4201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9ca9abb
5e88248
9ca9abb
5e88248
9ca9abb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e5f2ae
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
library_name: ml-agents
tags:
- SnowballTarget
- deep-reinforcement-learning
- reinforcement-learning
- ML-Agents-SnowballTarget
license: apache-2.0
---

![8s6tgwmc.png](https://cdn-uploads.huggingface.co/production/uploads/6493577a357b252af725bf67/wQNbXcvUaoEuV6FtWu9rS.png)

  # **ppo** Agent playing **SnowballTarget**
  This is a trained model of a **ppo** agent playing **SnowballTarget**
  using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).

  ## Usage (with ML-Agents)
  The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/

  ### Watch the Agent play
  You can watch the agent **playing directly in your browser**

  1. Go to https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget
  2. Step 1: Find the model_id: Francesco-A/ppo-SnowballTarget-v1
  3. Step 2: Select the *.nn /*.onnx file
  4. Click on Watch the agent play

  ## Training hyperparameters
  
```python
behaviors:
  SnowballTarget:
    trainer_type: ppo
    summary_freq: 10000
    keep_checkpoints: 10
    checkpoint_interval: 55000
    max_steps: 250000
    time_horizon: 64
    threaded: true
    hyperparameters:
      learning_rate: 0.0003
      learning_rate_schedule: linear
      batch_size: 128
      buffer_size: 2048
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
```

## Training details

| Step    | Time Elapsed | Mean Reward | Std of Reward | Status    |
|---------|--------------|-------------|---------------|-----------|
| 10000   | 29.079 s     | 3.636       | 1.746         | Training  |
| 20000   | 55.042 s     | 7.164       | 2.661         | Training  |
| 30000   | 77.884 s     | 9.818       | 2.534         | Training  |
| 40000   | 103.229 s    | 11.509      | 2.263         | Training  |
| 50000   | 127.046 s    | 14.659      | 2.495         | Training  |
| 60000   | 150.811 s    | 15.655      | 2.414         | Training  |
| 70000   | 174.292 s    | 16.955      | 2.540         | Training  |
| 80000   | 198.938 s    | 18.091      | 2.481         | Training  |
| 90000   | 221.915 s    | 19.182      | 3.143         | Training  |
| 100000  | 246.203 s    | 21.182      | 2.724         | Training  |
| 110000  | 271.024 s    | 22.463      | 2.250         | Training  |
| 120000  | 292.551 s    | 24.044      | 2.190         | Training  |
| 130000  | 317.539 s    | 24.291      | 2.103         | Training  |
| 140000  | 340.057 s    | 24.455      | 4.423         | Training  |
| 150000  | 366.645 s    | 25.236      | 2.358         | Training  |
| 160000  | 390.192 s    | 25.000      | 1.895         | Training  |
| 170000  | 414.326 s    | 25.273      | 2.482         | Training  |
| 180000  | 438.103 s    | 25.750      | 1.798         | Training  |
| 190000  | 462.837 s    | 25.673      | 1.888         | Training  |
| 200000  | 485.258 s    | 25.295      | 2.380         | Training  |
| 210000  | 509.542 s    | 25.855      | 2.066         | Training  |
| 220000  | 535.202 s    | 26.111      | 1.931         | Training  |
| 230000  | 556.965 s    | 25.644      | 2.252         | Training  |
| 240000  | 582.135 s    | 26.018      | 2.673         | Training  |
| 250000  | 604.248 s    | 26.091      | 1.917         | Training  |