gonzalez-agirre commited on
Commit
29dd32b
·
1 Parent(s): ee3923d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -106,7 +106,7 @@ The training corpus consists of several corpora gathered from web crawling and p
106
  ### Training Procedure
107
 
108
  The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
109
- used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 52,000 tokens.
110
  The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
111
  with the same hyperparameters as in the original work.
112
  The training lasted a total of 96 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.
 
106
  ### Training Procedure
107
 
108
  The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
109
+ used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 50,262 tokens.
110
  The RoBERTa-ca-v2 pretraining consists of a masked language model training that follows the approach employed for the RoBERTa base model
111
  with the same hyperparameters as in the original work.
112
  The training lasted a total of 96 hours with 16 NVIDIA V100 GPUs of 16GB DDRAM.