YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

character-bert_NArabizi

This model is a CharacterBERT-based model trained from scratch on the NArabizi raw data, code-switched dialectal Arabic (NArabizi) from social media.

It was introduced in the paper:


πŸ“ Description

The model follows the CharacterBERT architecture (El Boukkouri et al., 2020), which removes WordPiece tokenization in favor of a Character-CNN module that generates full word representations from raw character sequences.


πŸ“– Citation

If you use this model, please cite both the paper that introduced it and the original CharacterBERT architecture:

@inproceedings{riabi-2021-can,
    title = "Can Character-based Language Models Improve Downstream Task Performances in Low-Resource and Noisy Language Scenarios?",
    author = {Riabi, Arij and Sagot, Beno{\^i}t and Seddah, Djam{\'e}},
    booktitle = "Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic (Online)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.wnut-1.7/"
}

@inproceedings{el-boukkouri-etal-2020-characterbert,
    title = "{C}haracter{BERT}: Reconciling {ELM}o and {BERT} for Word-Level Open-Vocabulary Representations From Characters",
    author = "El Boukkouri, Hicham  and
      Ferret, Olivier  and
      Lavergne, Thomas  and
      Noji, Hiroshi  and
      Zweigenbaum, Pierre  and
      Tsujii, Jun{'}ichi",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.609",
    doi = "10.18653/v1/2020.coling-main.609",
    pages = "6903--6915"
}
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support