File size: 2,109 Bytes
92a1c5c 02301cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | # character-bert_NArabizi
This model is a **CharacterBERT-based model trained from scratch** on the **NArabizi raw data**, code-switched dialectal Arabic (NArabizi) from social media.
It was introduced in the paper:
- [Can Character-based Language Models Improve Downstream Task Performances in Low-Resource and Noisy Language Scenarios?](https://aclanthology.org/2021.wnut-1.7/)
(Riabi et al., 2021, W-NUT)
---
## 📝 Description
The model follows the **CharacterBERT** architecture (El Boukkouri et al., 2020), which removes WordPiece tokenization in favor of a **Character-CNN module** that generates full word representations from raw character sequences.
---
## 📖 Citation
If you use this model, please cite both the paper that introduced it and the original CharacterBERT architecture:
```bibtex
@inproceedings{riabi-2021-can,
title = "Can Character-based Language Models Improve Downstream Task Performances in Low-Resource and Noisy Language Scenarios?",
author = {Riabi, Arij and Sagot, Beno{\^i}t and Seddah, Djam{\'e}},
booktitle = "Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic (Online)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.wnut-1.7/"
}
@inproceedings{el-boukkouri-etal-2020-characterbert,
title = "{C}haracter{BERT}: Reconciling {ELM}o and {BERT} for Word-Level Open-Vocabulary Representations From Characters",
author = "El Boukkouri, Hicham and
Ferret, Olivier and
Lavergne, Thomas and
Noji, Hiroshi and
Zweigenbaum, Pierre and
Tsujii, Jun{'}ichi",
booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
month = dec,
year = "2020",
address = "Barcelona, Spain (Online)",
publisher = "International Committee on Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.coling-main.609",
doi = "10.18653/v1/2020.coling-main.609",
pages = "6903--6915"
}
|