Yoruba-English Code-Switching Language Identification (LID) - Mini
This model is a highly efficient, fine-tuned version of Davlan/afro-xlmr-mini designed for token-level Language Identification (LID) in Yoruba-English code-switched text.
Research Highlights
- High Accuracy, Low Footprint: Achieved an Overall F1-score of 99.05%, matching the performance of "Large" models (550M parameters) while using a significantly smaller architecture (approx. 17M parameters).
- Efficiency: Optimized for deployment in resource-constrained environments or high-throughput real-time applications.
- African Language Focus: Built upon the AfroXLM-R-Mini base, leveraging pre-training specifically tailored for African linguistic structures.
Performance Evaluation (Test Set)
The following results were obtained on the held-out test set (~80k tokens):
| Language | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Overall | 0.990 | 0.991 | 0.991 | 80,085 |
| English | 0.994 | 0.995 | 0.994 | 63,016 |
| Yoruba | 0.976 | 0.978 | 0.977 | 17,069 |
Evaluation Metrics
- Overall Accuracy: 99.56%
- Evaluation Loss: 0.0947
Model Comparison (Ablation Study)
In our research, we compared this "Mini" architecture against a "Large" baseline to evaluate the trade-off between size and accuracy:
| Model | Parameters | Overall F1 | Speed (Samples/sec) |
|---|---|---|---|
| AfroXLM-R Large | 550M | 99.07% | ~300 |
| AfroXLM-R Mini | 17M | 99.05% | 1712 |
Training Procedure
Training Narrative
The model was trained for 5 epochs on an A100 GPU. We utilized a large global batch size (256) and mixed-precision training (BF16) to ensure stable and fast convergence. Unlike larger models that may overfit rapidly on LID tasks, the Mini architecture showed a healthy learning curve with validation loss steadily decreasing throughout the process.
Hyperparameters
- Learning Rate: 3e-05
- Global Batch Size: 256
- Optimizer: AdamW (Fused)
- LR Scheduler: Cosine Decay
- Warmup Ratio: 0.1
Training Logs
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 1.0 | 313 | 0.2852 |
| 0.4325 | 2.0 | 626 | 0.1622 |
| 0.4325 | 3.0 | 939 | 0.1153 |
| 0.1482 | 4.0 | 1252 | 0.1000 |
| 0.1039 | 5.0 | 1565 | 0.0977 |
Usage
from transformers import pipeline
# Load the model directly from the Hub
lid_model = pipeline("token-classification", model="Professor/yoruba-en-ner-model-small")
text = "Ẹ jẹ́ kí á lọ si cinema to watch the latest movie."
results = lid_model(text)
for entity in results:
print(f"Token: {entity['word']}, Language: {entity['entity']}")
Intended Uses & Limitations
This model is intended for researchers and developers working on bilingual text processing for Nigerian English and Yoruba. While highly accurate, users should note that performance may vary on text with non-standard orthography or code-switching involving third languages (e.g., Nigerian Pidgin).
Citation
If you use this model in your research, please cite the original AfroXLM-R paper and this specific fine-tuned release.
- Downloads last month
- -
Model tree for Professor/yoruba-en-ner-model-small
Base model
Davlan/afro-xlmr-miniEvaluation results
- Overall F1 on Yoruba-English Code-Switched Datasetself-reported0.991
- Overall Precision on Yoruba-English Code-Switched Datasetself-reported0.990
- Overall Recall on Yoruba-English Code-Switched Datasetself-reported0.991