aubmindlab
/

bert-base-arabertv02

Model card Files Files and versions

Metrics Training metrics Community

wissamantoun commited on Nov 15, 2022

Commit

590a87e

·

1 Parent(s): c655777

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -46,9 +46,9 @@ All models are available in the `HuggingFace` model page under the [aubmindlab](
 We identified an issue with AraBERTv1's wordpiece vocabulary. The issue came from punctuations and numbers that were still attached to words when learned the wordpiece vocab. We now insert a space between numbers and characters and around punctuation characters.
-The new vocabulary was learnt using the `BertWordpieceTokenizer` from the `tokenizers` library, and should now support the Fast tokenizer implementation from the `transformers` library.
-**P.S.**: All the old BERT codes should work with the new BERT, just change the model name and check the new preprocessing dunction
 **Please read the section on how to use the [preprocessing function](#Preprocessing)**
 ## Bigger Dataset and More Compute
@@ -86,7 +86,7 @@ It is recommended to apply our preprocessing function before training/testing on
 ```python
 from arabert.preprocess import ArabertPreprocessor
-model_name="bert-base-arabertv02"
 arabert_prep = ArabertPreprocessor(model_name=model_name)
 text = "ولن نبالغ إذا قلنا: إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"

 We identified an issue with AraBERTv1's wordpiece vocabulary. The issue came from punctuations and numbers that were still attached to words when learned the wordpiece vocab. We now insert a space between numbers and characters and around punctuation characters.
+The new vocabulary was learned using the `BertWordpieceTokenizer` from the `tokenizers` library, and should now support the Fast tokenizer implementation from the `transformers` library.
+**P.S.**: All the old BERT codes should work with the new BERT, just change the model name and check the new preprocessing function
 **Please read the section on how to use the [preprocessing function](#Preprocessing)**
 ## Bigger Dataset and More Compute
 ```python
 from arabert.preprocess import ArabertPreprocessor
+model_name="aubmindlab/bert-large-arabertv02"
 arabert_prep = ArabertPreprocessor(model_name=model_name)
 text = "ولن نبالغ إذا قلنا: إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"