|
|
--- |
|
|
license: mit |
|
|
metrics: |
|
|
- bleu |
|
|
base_model: |
|
|
- facebook/nllb-200-distilled-600M |
|
|
tags: |
|
|
- nlp, |
|
|
- low-resource, |
|
|
- efik, |
|
|
- african-language, |
|
|
- translation, |
|
|
--- |
|
|
|
|
|
# Efik ↔ English Translation Model |
|
|
|
|
|
This model provides **machine translation between English and Efik**. It was fine-tuned on **18k+ parallel sentences** using the **NLLB architecture** and can be used for both direct translation and integration into multilingual NLP pipelines. |
|
|
|
|
|
### Uses |
|
|
- Translate text between English and Efik. |
|
|
- Assist in educational or localization projects involving Efik. |
|
|
- Support research in low-resource language NLP. |
|
|
|
|
|
### Limitations |
|
|
- Due to limited data, performance may decrease for **long, complex, or domain-specific text**. |
|
|
|
|
|
### How to Get Started |
|
|
```python |
|
|
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("offiongbassey/efik-mt") |
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("offiongbassey/efik-mt") |
|
|
|
|
|
# English → Efik |
|
|
text = "My child is very sick and I need to take him to the hospital for treatment." |
|
|
inputs = tokenizer(f"eng_Latn {text}", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=128) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
# Efik → English |
|
|
text = "Okon ama adaha utom tọñọ usenubọk." |
|
|
inputs = tokenizer(f"ibo_Latn {text}", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=128) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Training Details |
|
|
- Architecture: NLLB |
|
|
- Epochs trained: 8 |
|
|
- Learning Rate: 5e-05 |
|
|
- BLEU Scores: |
|
|
- EN → EF: 29.58 |
|
|
- EF → EN: 32.14 |
|
|
- chrF: |
|
|
- EN → EF: 54.29 |
|
|
- EF → EN: 48.78 |