Commit
·
97066bf
1
Parent(s):
90a670b
Update README.md
Browse files
README.md
CHANGED
|
@@ -28,7 +28,7 @@ The training corpus has been tokenized using a byte version of [Byte-Pair Encodi
|
|
| 28 |
used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 52,000 tokens.
|
| 29 |
The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
|
| 30 |
with the same hyperparameters as in the original work.
|
| 31 |
-
The training lasted a total of
|
| 32 |
|
| 33 |
## Training corpora and preprocessing
|
| 34 |
|
|
|
|
| 28 |
used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 52,000 tokens.
|
| 29 |
The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
|
| 30 |
with the same hyperparameters as in the original work.
|
| 31 |
+
The training lasted a total of 96 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
|
| 32 |
|
| 33 |
## Training corpora and preprocessing
|
| 34 |
|