Commit
·
9bb0c8c
1
Parent(s):
ff189ac
Update README.md
Browse files
README.md
CHANGED
|
@@ -116,22 +116,9 @@ The training corpus consists of several corpora gathered from web crawling and p
|
|
| 116 |
### Training procedure
|
| 117 |
|
| 118 |
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
|
| 119 |
-
used in the original [RoBERTA](https://github.com/
|
| 120 |
-
|
| 121 |
-
### Author
|
| 122 |
-
Text Mining Unit (TeMU) at the Barcelona Supercomputing Center (bsc-temu@bsc.es)
|
| 123 |
-
|
| 124 |
-
### Contact information
|
| 125 |
-
For further information, send an email to <plantl-gob-es@bsc.es>
|
| 126 |
-
|
| 127 |
-
### Copyright
|
| 128 |
-
Copyright by the Spanish State Secretariat for Digitalization and Artificial Intelligence (SEDIA) (2022)
|
| 129 |
-
|
| 130 |
-
### Licensing informationytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,262 tokens.
|
| 131 |
The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
|
| 132 |
-
with the same hyperparameters as in the original work.
|
| 133 |
-
The training lasted a total of 96 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
|
| 134 |
-
|
| 135 |
|
| 136 |
## Evaluation
|
| 137 |
|
|
|
|
| 116 |
### Training procedure
|
| 117 |
|
| 118 |
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
|
| 119 |
+
used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,262 tokens.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
|
| 121 |
+
with the same hyperparameters as in the original work. The training lasted a total of 96 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
|
|
|
|
|
|
|
| 122 |
|
| 123 |
## Evaluation
|
| 124 |
|