File size: 1,383 Bytes
3470c4f 5af0a90 3470c4f 4f80ae0 ad64456 3470c4f 4f80ae0 3470c4f 20baca4 4f80ae0 3470c4f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
## Description
Afroscope-model is a language identification (LID) model from the AfroScope project, fine-tuned on [Serengeti](https://huggingface.co/UBC-NLP/serengeti), supporting 713 African languages.
For more details on the supported languages and performance, as well as significant changes from previous versions, please refer to LINK_HERE.
- **Dataset:** [dataset](https://huggingface.co/datasets/14kwonss/afroscope-data)
- **Repository:** [github](https://github.com/skwon01-UBC/AfroScope?tab=readme-ov-file)
- **Paper:** [Arxiv](https://www.arxiv.org/pdf/2601.13346)
---
## How to use
Here is how to use this model to detect the language of a given text:
```python
from transformers import pipeline
afroscope_model = pipeline("text-classification", model='UBC-NLP/afroscope-model')
input_text="Ninyepuní íne εtɩε, bε ewǐe Jesi ɔnʋ lεfε kʋkʋkpɔ cε."
result = afroscope_model(input_text)
# Extract the label and score from the first result
language = result[0]['label']
score = result[0]['score']
print(f"detected langauge: {language}\tscore: {round(score*100, 2)}")
```
## Citation
```bibtex
@article{kwon2026afroscope,
title={AfroScope: A Framework for Studying the Linguistic Landscape of Africa},
author={Kwon, Sang Yun and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad},
journal={arXiv preprint arXiv:2601.13346},
year={2026}
}
``` |