Update README.md

Browse files

Files changed (1) hide show

README.md +19 -7

README.md CHANGED Viewed

@@ -18,16 +18,28 @@ library_name: Transformers
 # NeoAraBERT
 NeoAraBERT is a state-of-the-art open-source Arabic text-embedding model built on the NeoBERT architecture. We pretrain NeoAraBERT on diverse open-source and internal datasets covering modern standard, classical, and dialectal Arabic. We guided our design choices with Arabic tailored ablation studies including text normalization, light stemming, and diacritics-aware tokenization handling. We also performed POS-aware token masking and learning-rate scheduling ablation studies. We benchmarked NeoAraBERT against five top-performing Arabic models on 23 tasks, including a synonym-based task, [Muradif](https://huggingface.co/datasets/U4RASD/Muradif), that directly assesses embedding quality with no additional fine-tuning. NeoAraBERT variants rank first in 18 tasks and improve average performance across the full benchmark suite.
-This is the NeoAraBERT_Mix checkpoint, our best-performing checkpoint overall. This model was introduced at the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026). For more information, visit our website: https://acr.ps/neoarabert.
 The available NeoAraBERT checkpoints:
-  | Model | Description | Link |
-  |---|---|---|
-  | NeoAraBERT     | Trained on both Modern Standard Arabic and Dialectal Arabic. | this repository ✅ |
-  | NeoAraBERT_MSA | Trained on Modern Standard Arabic. | [link](https://huggingface.co/U4RASD/NeoAraBERT_MSA) |
-  | NeoAraBERT_DA  | Trained on Dialectal Arabic. | [link](https://huggingface.co/U4RASD/NeoAraBERT_DA) |
-![bench](https://cdn-uploads.huggingface.co/production/uploads/65338533a78e70d19c850120/1Hmc13qHxygG2bQl98xv9.png)
 ### How to Use
 Install these libraries:

 # NeoAraBERT
 NeoAraBERT is a state-of-the-art open-source Arabic text-embedding model built on the NeoBERT architecture. We pretrain NeoAraBERT on diverse open-source and internal datasets covering modern standard, classical, and dialectal Arabic. We guided our design choices with Arabic tailored ablation studies including text normalization, light stemming, and diacritics-aware tokenization handling. We also performed POS-aware token masking and learning-rate scheduling ablation studies. We benchmarked NeoAraBERT against five top-performing Arabic models on 23 tasks, including a synonym-based task, [Muradif](https://huggingface.co/datasets/U4RASD/Muradif), that directly assesses embedding quality with no additional fine-tuning. NeoAraBERT variants rank first in 18 tasks and improve average performance across the full benchmark suite.
+This is the **NeoAraBERT_Mix** checkpoint, our best-performing checkpoint overall. This model was introduced at the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026). For more information, visit our website: https://acr.ps/neoarabert.
 The available NeoAraBERT checkpoints:
+| Model | Description | Link |
+|---|---|---|
+| NeoAraBERT (**NeoAraBERT_Mix**)    | Trained on both Modern Standard Arabic and Dialectal Arabic. | this repository ✅ |
+| NeoAraBERT_MSA | Trained on Modern Standard Arabic. | [link](https://huggingface.co/U4RASD/NeoAraBERT_MSA) |
+| NeoAraBERT_DA  | Trained on Dialectal Arabic. | [link](https://huggingface.co/U4RASD/NeoAraBERT_DA) |
+| Model              | Average Score |
+| ------------------ | ------------: |
+| **NeoAraBERT_Mix** |     **83.79** |
+| NeoAraBERT_DA      |         83.44 |
+| NeoAraBERT_MSA     |         83.30 |
+| AraModernBERT      |         81.04 |
+| AraBERTv2          |         80.75 |
+| MARBERTv2          |         80.45 |
+| ARBERTv2           |         80.31 |
+| CAMeLBERT-mix      |         80.04 |
+For detailed benchmarking, see https://acr.ps/neoarabert.
 ### How to Use
 Install these libraries: