Commit ·
99a589c
1
Parent(s): 4413966
Update README.md
Browse files
README.md
CHANGED
|
@@ -69,7 +69,7 @@ This model is a [RoBERTa-based](https://github.com/pytorch/fairseq/tree/master/e
|
|
| 69 |
- elrc datsets
|
| 70 |
|
| 71 |
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
|
| 72 |
-
used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 52,000 tokens. The pretraining consists of a masked language model training at the subword level following the approach employed for the RoBERTa base model with the same hyperparameters as in the original work.
|
| 73 |
|
| 74 |
|
| 75 |
### Training corpora and preprocessing
|
|
|
|
| 69 |
- elrc datsets
|
| 70 |
|
| 71 |
The training corpus has been tokenized using a byte version of [Byte-Pair Encoding (BPE)](https://github.com/openai/gpt-2)
|
| 72 |
+
used in the original [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) model with a vocabulary size of 52,000 tokens. The pretraining consists of a masked language model training at the subword level following the approach employed for the RoBERTa base model with the same hyperparameters as in the original work.
|
| 73 |
|
| 74 |
|
| 75 |
### Training corpora and preprocessing
|