| # character-bert_NArabizi | |
| This model is a **CharacterBERT-based model trained from scratch** on the **NArabizi raw data**, code-switched dialectal Arabic (NArabizi) from social media. | |
| It was introduced in the paper: | |
| - [Can Character-based Language Models Improve Downstream Task Performances in Low-Resource and Noisy Language Scenarios?](https://aclanthology.org/2021.wnut-1.7/) | |
| (Riabi et al., 2021, W-NUT) | |
| --- | |
| ## ๐ Description | |
| The model follows the **CharacterBERT** architecture (El Boukkouri et al., 2020), which removes WordPiece tokenization in favor of a **Character-CNN module** that generates full word representations from raw character sequences. | |
| --- | |
| ## ๐ Citation | |
| If you use this model, please cite both the paper that introduced it and the original CharacterBERT architecture: | |
| ```bibtex | |
| @inproceedings{riabi-2021-can, | |
| title = "Can Character-based Language Models Improve Downstream Task Performances in Low-Resource and Noisy Language Scenarios?", | |
| author = {Riabi, Arij and Sagot, Beno{\^i}t and Seddah, Djam{\'e}}, | |
| booktitle = "Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)", | |
| month = nov, | |
| year = "2021", | |
| address = "Punta Cana, Dominican Republic (Online)", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2021.wnut-1.7/" | |
| } | |
| @inproceedings{el-boukkouri-etal-2020-characterbert, | |
| title = "{C}haracter{BERT}: Reconciling {ELM}o and {BERT} for Word-Level Open-Vocabulary Representations From Characters", | |
| author = "El Boukkouri, Hicham and | |
| Ferret, Olivier and | |
| Lavergne, Thomas and | |
| Noji, Hiroshi and | |
| Zweigenbaum, Pierre and | |
| Tsujii, Jun{'}ichi", | |
| booktitle = "Proceedings of the 28th International Conference on Computational Linguistics", | |
| month = dec, | |
| year = "2020", | |
| address = "Barcelona, Spain (Online)", | |
| publisher = "International Committee on Computational Linguistics", | |
| url = "https://www.aclweb.org/anthology/2020.coling-main.609", | |
| doi = "10.18653/v1/2020.coling-main.609", | |
| pages = "6903--6915" | |
| } | |