Update README.md
Browse filesModel Card: Training hyper parameters
README.md
CHANGED
|
@@ -88,6 +88,25 @@ Use the code below to get started with the model.
|
|
| 88 |
|
| 89 |
## Training Details
|
| 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
### Training Data
|
| 92 |
|
| 93 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
|
| 88 |
|
| 89 |
## Training Details
|
| 90 |
|
| 91 |
+
## Training hyperparameters
|
| 92 |
+
```
|
| 93 |
+
vocab_size=len(tokenizer),
|
| 94 |
+
num_attention_heads=8,
|
| 95 |
+
num_hidden_layers=16,
|
| 96 |
+
hidden_size=512,
|
| 97 |
+
intermediate_size=2048,
|
| 98 |
+
hidden_act='gelu',
|
| 99 |
+
hidden_dropout_prob=0.15,
|
| 100 |
+
relative_attention=True,
|
| 101 |
+
pos_att_type='c2p|p2c',
|
| 102 |
+
max_relative_positions=-1,
|
| 103 |
+
position_biased_input=False,
|
| 104 |
+
attention_probs_dropout_prob=0.15,
|
| 105 |
+
initializer_range=0.02,
|
| 106 |
+
layer_norm_eps=1e-7,
|
| 107 |
+
|
| 108 |
+
````
|
| 109 |
+
|
| 110 |
### Training Data
|
| 111 |
|
| 112 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|