Update README.md
Browse files
README.md
CHANGED
|
@@ -120,7 +120,7 @@ The BERT model was pretrained on pre-k to HS math curriculum (engageNY, Utah Mat
|
|
| 120 |
|
| 121 |
#### Training procedure
|
| 122 |
|
| 123 |
-
The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,522. The inputs of the model are then of the form:
|
| 124 |
|
| 125 |
```
|
| 126 |
[CLS] Sentence A [SEP] Sentence B [SEP]
|
|
|
|
| 120 |
|
| 121 |
#### Training procedure
|
| 122 |
|
| 123 |
+
The texts are lowercased and tokenized using WordPiece and a customized vocabulary size of 30,522. We use the ```bert_tokenizer``` from huggingface tokenizers library to generate a custom vocab file from our training raw math texts. The inputs of the model are then of the form:
|
| 124 |
|
| 125 |
```
|
| 126 |
[CLS] Sentence A [SEP] Sentence B [SEP]
|