TruptiG commited on
Commit
9fd9900
·
verified ·
1 Parent(s): d83c08e

Update README.md

Browse files

Model Card: Training hyper parameters

Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -88,6 +88,25 @@ Use the code below to get started with the model.
88
 
89
  ## Training Details
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ### Training Data
92
 
93
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
88
 
89
  ## Training Details
90
 
91
+ ## Training hyperparameters
92
+ ```
93
+ vocab_size=len(tokenizer),
94
+ num_attention_heads=8,
95
+ num_hidden_layers=16,
96
+ hidden_size=512,
97
+ intermediate_size=2048,
98
+ hidden_act='gelu',
99
+ hidden_dropout_prob=0.15,
100
+ relative_attention=True,
101
+ pos_att_type='c2p|p2c',
102
+ max_relative_positions=-1,
103
+ position_biased_input=False,
104
+ attention_probs_dropout_prob=0.15,
105
+ initializer_range=0.02,
106
+ layer_norm_eps=1e-7,
107
+
108
+ ````
109
+
110
  ### Training Data
111
 
112
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->