Update README.md
Browse files
README.md
CHANGED
|
@@ -25,7 +25,26 @@ model-index:
|
|
| 25 |
This is a trained model of a **PPO** agent playing **LunarLander-v2**
|
| 26 |
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
|
| 27 |
|
| 28 |
-
Hyperparameters were optimized with [Optuna](https://pypi.org/project/optuna/).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Usage
|
| 31 |
|
|
|
|
| 25 |
This is a trained model of a **PPO** agent playing **LunarLander-v2**
|
| 26 |
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
|
| 27 |
|
| 28 |
+
Hyperparameters used to train were optimized with [Optuna](https://pypi.org/project/optuna/).
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
{
|
| 32 |
+
"learning_rate": 0.00038779746460731866,
|
| 33 |
+
"n_steps": 2048,
|
| 34 |
+
"batch_size": 128,
|
| 35 |
+
"n_epochs": 13,
|
| 36 |
+
"gamma": 0.9927390555180292,
|
| 37 |
+
"gae_lambda": 0.9353501463066322,
|
| 38 |
+
"clip_range": clip_range,
|
| 39 |
+
"ent_coef": 0.007068533587811773,
|
| 40 |
+
"policy_kwargs": {
|
| 41 |
+
"net_arch": {'pi': [512, 512], 'vf': [512, 512]},
|
| 42 |
+
"activation_fn": nn.Tanh
|
| 43 |
+
},
|
| 44 |
+
}
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
> Learning rate was used as an initial value for a linear scheduler during training. See [this github issue](https://github.com/DLR-RM/stable-baselines3/issues/246) for more information
|
| 48 |
|
| 49 |
## Usage
|
| 50 |
|