Update README.md
Browse files
README.md
CHANGED
|
@@ -4,7 +4,7 @@ widget:
|
|
| 4 |
- text: "للوقايه من انتشار [MASK]"
|
| 5 |
---
|
| 6 |
# arabert_c19: An Arabert model pretrained on 1.5 million COVID-19 multi-dialect Arabic tweets
|
| 7 |
-
**mBERT COVID-19** is a pretrained (fine-tuned) version of the mBERT model (https://huggingface.co/bert-base-multilingual-cased). The pretraining was done using 1.5 million multi-dialect Arabic tweets regarding the COVID-19 pandemic from the “Large Arabic Twitter Dataset on COVID-19” (https://arxiv.org/abs/2004.04315).
|
| 8 |
The model can achieve better results for the tasks that deal with multi-dialect Arabic tweets in relation to the COVID-19 pandemic.
|
| 9 |
|
| 10 |
# Classification results for multiple tasks including fake-news and hate speech detection when using arabert_c19 and mbert_ar_c19:
|
|
@@ -25,5 +25,21 @@ arabert_prep = ArabertPreprocessor(model_name=model_name)
|
|
| 25 |
text = "للوقايه من عدم انتشار كورونا عليك اولا غسل اليدين بالماء والصابون وتكون عملية الغسل دقيقه تشمل راحة اليد الأصابع التركيز على الإبهام"
|
| 26 |
arabert_prep.preprocess(text)
|
| 27 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
# Contacts
|
| 29 |
**Hadj Ameur**: [Github](https://github.com/MohamedHadjAmeur) | <mohamedhadjameur@gmail.com> | <mhadjameur@cerist.dz>
|
|
|
|
| 4 |
- text: "للوقايه من انتشار [MASK]"
|
| 5 |
---
|
| 6 |
# arabert_c19: An Arabert model pretrained on 1.5 million COVID-19 multi-dialect Arabic tweets
|
| 7 |
+
**mBERT COVID-19** [Arxiv URL](https://arxiv.org/pdf/2105.03143.pdf) is a pretrained (fine-tuned) version of the mBERT model (https://huggingface.co/bert-base-multilingual-cased). The pretraining was done using 1.5 million multi-dialect Arabic tweets regarding the COVID-19 pandemic from the “Large Arabic Twitter Dataset on COVID-19” (https://arxiv.org/abs/2004.04315).
|
| 8 |
The model can achieve better results for the tasks that deal with multi-dialect Arabic tweets in relation to the COVID-19 pandemic.
|
| 9 |
|
| 10 |
# Classification results for multiple tasks including fake-news and hate speech detection when using arabert_c19 and mbert_ar_c19:
|
|
|
|
| 25 |
text = "للوقايه من عدم انتشار كورونا عليك اولا غسل اليدين بالماء والصابون وتكون عملية الغسل دقيقه تشمل راحة اليد الأصابع التركيز على الإبهام"
|
| 26 |
arabert_prep.preprocess(text)
|
| 27 |
```
|
| 28 |
+
|
| 29 |
+
# Citation
|
| 30 |
+
|
| 31 |
+
Please cite as:
|
| 32 |
+
|
| 33 |
+
``` bibtex
|
| 34 |
+
@misc{ameur2021aracovid19mfh,
|
| 35 |
+
title={AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset},
|
| 36 |
+
author={Mohamed Seghir Hadj Ameur and Hassina Aliane},
|
| 37 |
+
year={2021},
|
| 38 |
+
eprint={2105.03143},
|
| 39 |
+
archivePrefix={arXiv},
|
| 40 |
+
primaryClass={cs.CL}
|
| 41 |
+
}
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
# Contacts
|
| 45 |
**Hadj Ameur**: [Github](https://github.com/MohamedHadjAmeur) | <mohamedhadjameur@gmail.com> | <mhadjameur@cerist.dz>
|