Vít Novotný
commited on
Commit
·
f353573
1
Parent(s):
c5f9bdf
Update link to repository
Browse files
README.md
CHANGED
|
@@ -14,13 +14,13 @@ CLEF 2022 and first released in [this repository][2]. This model is case-sensiti
|
|
| 14 |
it makes a difference between english and English.
|
| 15 |
|
| 16 |
[1]: https://www.cs.rit.edu/~dprl/ARQMath/
|
| 17 |
-
[2]: https://github.com/witiko/scm-at-arqmath3
|
| 18 |
|
| 19 |
## Model description
|
| 20 |
|
| 21 |
-
MathBERTa is [the RoBERTa base transformer model][3] whose tokenizer has been
|
| 22 |
-
extended with LaTeX math symbols and which has been fine-tuned on a large
|
| 23 |
-
corpus of English mathematical texts.
|
| 24 |
|
| 25 |
Like RoBERTa, MathBERTa has been fine-tuned with the Masked language modeling
|
| 26 |
(MLM) objective. Taking a sentence, the model randomly masks 15% of the words
|
|
@@ -30,6 +30,8 @@ learns an inner representation of the English language and the language of
|
|
| 30 |
LaTeX that can then be used to extract features useful for downstream tasks.
|
| 31 |
|
| 32 |
[3]: https://huggingface.co/roberta-base
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## Intended uses & limitations
|
| 35 |
|
|
|
|
| 14 |
it makes a difference between english and English.
|
| 15 |
|
| 16 |
[1]: https://www.cs.rit.edu/~dprl/ARQMath/
|
| 17 |
+
[2]: https://github.com/witiko/scm-at-arqmath3
|
| 18 |
|
| 19 |
## Model description
|
| 20 |
|
| 21 |
+
MathBERTa is [the RoBERTa base transformer model][3] whose [tokenizer has been
|
| 22 |
+
extended with LaTeX math symbols][7] and which has been [fine-tuned on a large
|
| 23 |
+
corpus of English mathematical texts][8].
|
| 24 |
|
| 25 |
Like RoBERTa, MathBERTa has been fine-tuned with the Masked language modeling
|
| 26 |
(MLM) objective. Taking a sentence, the model randomly masks 15% of the words
|
|
|
|
| 30 |
LaTeX that can then be used to extract features useful for downstream tasks.
|
| 31 |
|
| 32 |
[3]: https://huggingface.co/roberta-base
|
| 33 |
+
[7]: https://github.com/Witiko/scm-at-arqmath3/blob/main/02-train-tokenizers.ipynb
|
| 34 |
+
[8]: https://github.com/witiko/scm-at-arqmath3/blob/main/03-finetune-roberta.ipynb
|
| 35 |
|
| 36 |
## Intended uses & limitations
|
| 37 |
|