fadi77 commited on
Commit
8ca4f72
·
verified ·
1 Parent(s): 8f5d729

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -35,7 +35,7 @@ The collection includes three models:
35
 
36
  All models were initially trained on a cleaned version of the Arabic Wikipedia dataset. The dataset is available at [fadi77/wikipedia_20231101.ar.phonemized](https://huggingface.co/datasets/fadi77/wikipedia_20231101.ar.phonemized).
37
 
38
- For the **mlm_only_with_diacritics** model, a random sample of 200,000 entries (out of approximately 1.2 million) was selected from the Wikipedia Arabic dataset and fully diacritized using the state-of-the-art CATT diacritizer ([Abjad AI, 2024](https://github.com/abjadai/catt)), introduced in [this paper](https://arxiv.org/abs/2407.03236).
39
 
40
  ### Training Procedure
41
 
 
35
 
36
  All models were initially trained on a cleaned version of the Arabic Wikipedia dataset. The dataset is available at [fadi77/wikipedia_20231101.ar.phonemized](https://huggingface.co/datasets/fadi77/wikipedia_20231101.ar.phonemized).
37
 
38
+ For the **mlm_only_with_diacritics** model, a random sample of 200,000 entries (out of approximately 1.2 million) was selected from the Wikipedia Arabic dataset and fully diacritized using the state-of-the-art CATT diacritizer ([Abjad AI, 2024](https://github.com/abjadai/catt)), introduced in [this paper](https://arxiv.org/abs/2407.03236) and licensed under CC BY-NC 4.0.
39
 
40
  ### Training Procedure
41