Fill-Mask
Transformers
PyTorch
xlm-roberta
Dialectal Arabic
Arabic
sequence labeling
Named entity recognition
Part-of-speech tagging
Zero-shot transfer learning
bert
Instructions to use 3ebdola/Dialectal-Arabic-XLM-R-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 3ebdola/Dialectal-Arabic-XLM-R-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="3ebdola/Dialectal-Arabic-XLM-R-Base")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("3ebdola/Dialectal-Arabic-XLM-R-Base") model = AutoModelForMaskedLM.from_pretrained("3ebdola/Dialectal-Arabic-XLM-R-Base") - Notebooks
- Google Colab
- Kaggle
Abdellah EL MEKKI commited on
Commit ·
1f87701
1
Parent(s): 9e11364
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,37 @@
|
|
| 1 |
-
-
|
| 2 |
-
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dialectal Arabic XLM-R Base
|
| 2 |
+
|
| 3 |
+
This is a repo of the language model used for "AdaSL: An Unsupervised Domain Adaptation framework for Arabic multi-dialectal Sequence Labeling". The state-of-the-art method for sequence labeling on multi-dialect Arabic.
|
| 4 |
+
|
| 5 |
+
### About the Dialectal-Arabic-XLM-R-Base model
|
| 6 |
+
|
| 7 |
+
We have built a 5 million Tweets corpus from Twitter. The crawled tweets cover the dialects of the four Arabic world regions (EGY, GLF, LEV, and MAG regions), as well as MSA. The collected corpus consists of one million (1M) tweets per Arabic variant. We did not perform any text pre-processing on the tweets, except by removing tweets that have a small length (tweets containing less than four words).
|
| 8 |
+
|
| 9 |
+
### Usage
|
| 10 |
+
The model weights can be loaded using `transformers` library by HuggingFace.
|
| 11 |
+
|
| 12 |
+
```python
|
| 13 |
+
from transformers import AutoTokenizer, AutoModel
|
| 14 |
+
tokenizer = AutoTokenizer.from_pretrained("3ebdola/Dialectal-Arabic-XLM-R-Base")
|
| 15 |
+
model = AutoModel.from_pretrained("3ebdola/Dialectal-Arabic-XLM-R-Base")
|
| 16 |
+
text = "هذا مثال لنص باللغة العربية, يمكنك استعمال اللهجات العربية أيضا"
|
| 17 |
+
encoded_input = tokenizer(text, return_tensors='pt')
|
| 18 |
+
output = model(**encoded_input)
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
### Citation
|
| 22 |
+
|
| 23 |
+
```
|
| 24 |
+
@article{ELMEKKI2022102964,
|
| 25 |
+
title = {AdaSL: An Unsupervised Domain Adaptation framework for Arabic multi-dialectal Sequence Labeling},
|
| 26 |
+
journal = {Information Processing & Management},
|
| 27 |
+
volume = {59},
|
| 28 |
+
number = {4},
|
| 29 |
+
pages = {102964},
|
| 30 |
+
year = {2022},
|
| 31 |
+
issn = {0306-4573},
|
| 32 |
+
doi = {https://doi.org/10.1016/j.ipm.2022.102964},
|
| 33 |
+
url = {https://www.sciencedirect.com/science/article/pii/S0306457322000814},
|
| 34 |
+
author = {Abdellah {El Mekki} and Abdelkader {El Mahdaouy} and Ismail Berrada and Ahmed Khoumsi},
|
| 35 |
+
keywords = {Dialectal Arabic, Arabic natural language processing, Domain adaptation, Multi-dialectal sequence labeling, Named entity recognition, Part-of-speech tagging, Zero-shot transfer learning}
|
| 36 |
+
}
|
| 37 |
+
```
|