3ebdola
/

Dialectal-Arabic-XLM-R-Base

Dialectal Arabic

sequence labeling

Named entity recognition

Part-of-speech tagging

Zero-shot transfer learning

Model card Files Files and versions

Abdellah EL MEKKI commited on Sep 7, 2022

Commit

330dea7

·

1 Parent(s): e14a9e0

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -111,6 +111,10 @@ This is a repo of the language model used for "AdaSL: An Unsupervised Domain Ada
 ### About the Dialectal-Arabic-XLM-R-Base model
 We have built a 5 million Tweets corpus from Twitter. The crawled tweets cover the dialects of the four Arabic world regions (EGY, GLF, LEV, and MAG regions), as well as MSA. The collected corpus consists of one million (1M) tweets per Arabic variant. We did not perform any text pre-processing on the tweets, except by removing tweets that have a small length (tweets containing less than four words).
 ### Usage

 ### About the Dialectal-Arabic-XLM-R-Base model
+This model is an trained as a further pre-trained of XLM-RoBERTa base using the Masked-language modeling on a dialectal Arabic corpus.
+### About the Dialectal-Arabic-XLM-R-Base model training corpora
 We have built a 5 million Tweets corpus from Twitter. The crawled tweets cover the dialects of the four Arabic world regions (EGY, GLF, LEV, and MAG regions), as well as MSA. The collected corpus consists of one million (1M) tweets per Arabic variant. We did not perform any text pre-processing on the tweets, except by removing tweets that have a small length (tweets containing less than four words).
 ### Usage