Commit
·
29dd32b
1
Parent(s):
ee3923d
Update README.md
Browse files
README.md
CHANGED
|
@@ -106,7 +106,7 @@ The training corpus consists of several corpora gathered from web crawling and p
|
|
| 106 |
### Training Procedure
|
| 107 |
|
| 108 |
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
|
| 109 |
-
used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of
|
| 110 |
The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
|
| 111 |
with the same hyperparameters as in the original work.
|
| 112 |
The training lasted a total of 96 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
|
|
|
|
| 106 |
### Training Procedure
|
| 107 |
|
| 108 |
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
|
| 109 |
+
used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,262 tokens.
|
| 110 |
The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
|
| 111 |
with the same hyperparameters as in the original work.
|
| 112 |
The training lasted a total of 96 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
|