Update README.md
Browse files
README.md
CHANGED
|
@@ -35,7 +35,7 @@ The collection includes three models:
|
|
| 35 |
|
| 36 |
All models were initially trained on a cleaned version of the Arabic Wikipedia dataset. The dataset is available at [fadi77/wikipedia_20231101.ar.phonemized](https://huggingface.co/datasets/fadi77/wikipedia_20231101.ar.phonemized).
|
| 37 |
|
| 38 |
-
For the **mlm_only_with_diacritics** model, a random sample of 200,000 entries (out of approximately 1.2 million) was selected from the Wikipedia Arabic dataset and fully diacritized using the state-of-the-art CATT diacritizer ([Abjad AI, 2024](https://github.com/abjadai/catt)), introduced in [this paper](https://arxiv.org/abs/2407.03236).
|
| 39 |
|
| 40 |
### Training Procedure
|
| 41 |
|
|
|
|
| 35 |
|
| 36 |
All models were initially trained on a cleaned version of the Arabic Wikipedia dataset. The dataset is available at [fadi77/wikipedia_20231101.ar.phonemized](https://huggingface.co/datasets/fadi77/wikipedia_20231101.ar.phonemized).
|
| 37 |
|
| 38 |
+
For the **mlm_only_with_diacritics** model, a random sample of 200,000 entries (out of approximately 1.2 million) was selected from the Wikipedia Arabic dataset and fully diacritized using the state-of-the-art CATT diacritizer ([Abjad AI, 2024](https://github.com/abjadai/catt)), introduced in [this paper](https://arxiv.org/abs/2407.03236) and licensed under CC BY-NC 4.0.
|
| 39 |
|
| 40 |
### Training Procedure
|
| 41 |
|