tbs17
/

MathBERT-custom

Model card Files Files and versions

tbs17 commited on May 16, 2021

Commit

0288bf3

·

1 Parent(s): 4ef51e6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -120,7 +120,7 @@ The BERT model was pretrained on pre-k to HS math curriculum (engageNY, Utah Mat
 #### Training procedure
-The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,522. The inputs of the model are then of the form:
 ```
 [CLS] Sentence A [SEP] Sentence B [SEP]

 #### Training procedure
+The texts are lowercased and tokenized using WordPiece and a customized vocabulary size of 30,522. We use the ```bert_tokenizer``` from huggingface tokenizers library to generate a custom vocab file from our training raw math texts. The inputs of the model are then of the form:
 ```
 [CLS] Sentence A [SEP] Sentence B [SEP]