witiko
/

mathberta

@@ -6,7 +6,7 @@ datasets:
 - math-stackexchange
 ---
-# MathBERTa base model
 Pretrained model on English language and LaTeX using a masked language modeling
 (MLM) objective. It was developed for [the ARQMath-3 shared task evaluation][1]
@@ -48,6 +48,11 @@ text generation you should look at model like GPT2.
 ### How to use
 You can use this model directly with a pipeline for masked language modeling:
 ```python
@@ -99,3 +104,5 @@ Together theses datasets weight 52GB of text and LaTeX.
  [5]: https://sigmathling.kwarc.info/resources/arxmliv-dataset-2020/
  [6]: https://www.cs.rit.edu/~dprl/ARQMath/arqmath-resources.html

 - math-stackexchange
 ---
+# MathBERTa model
 Pretrained model on English language and LaTeX using a masked language modeling
 (MLM) objective. It was developed for [the ARQMath-3 shared task evaluation][1]
 ### How to use
+*Due to the large number of added LaTeX tokens, MathBERTa is affected by [a
+software bug in the 🤗 Transformers library][9] that causes it to load for tens
+of minutes. The bug is [to be fixed in 🤗 Transformers 4.20.0][10].*
 You can use this model directly with a pipeline for masked language modeling:
 ```python
  [5]: https://sigmathling.kwarc.info/resources/arxmliv-dataset-2020/
  [6]: https://www.cs.rit.edu/~dprl/ARQMath/arqmath-resources.html
+ [9]: https://github.com/huggingface/transformers/issues/16936
+ [10]: https://github.com/huggingface/transformers/pull/17119