|
|
--- |
|
|
language: |
|
|
- en |
|
|
- wo |
|
|
license: mit |
|
|
tags: |
|
|
- translation |
|
|
- machine-translation |
|
|
- low-resource |
|
|
- english |
|
|
- wolof |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- bleu |
|
|
library_name: transformers |
|
|
pipeline_tag: translation |
|
|
model-index: |
|
|
- name: localenlp-en-wol |
|
|
results: |
|
|
- task: |
|
|
name: Translation |
|
|
type: translation |
|
|
dataset: |
|
|
name: English-Wolof Custom Dataset |
|
|
type: custom |
|
|
size: 84k |
|
|
metrics: |
|
|
- name: BLEU |
|
|
type: bleu |
|
|
value: 76.12 |
|
|
--- |
|
|
# localenlp-en-wol |
|
|
|
|
|
Fine-tuned MarianMT model for English-to-Wolof translation. |
|
|
|
|
|
# Model Card for `LOCALENLP/english-wolof` |
|
|
|
|
|
This is a machine translation model for **English → Wolof**, developed by the **LOCALENLP** organization. |
|
|
It is based on the pretrained `Helsinki-NLP/opus-mt-en-mul` MarianMT model and fine-tuned on a custom parallel corpus of ~84k sentence pairs. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
- **Developed by:** LOCALENLP |
|
|
- **Funded by [optional]:** N/A |
|
|
- **Shared by:** LOCALENLP |
|
|
- **Model type:** Seq2Seq Transformer (MarianMT) |
|
|
- **Languages:** English → Wolof |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) |
|
|
|
|
|
### Model Sources |
|
|
- **Repository:** https://huggingface.co/LOCALENLP/english-wolof |
|
|
- **Demo [optional]:** [To be integrated in Gradio / Web app](https://huggingface.co/spaces/LocaleNLP/eng_wol) |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
- Translate English text into Wolof for research, education, and communication. |
|
|
- Useful for low-resource NLP tasks, digital content creation, and cultural preservation. |
|
|
|
|
|
### Downstream Use |
|
|
- Can be integrated into translation apps, chatbots, and education platforms. |
|
|
- Serves as a base for further fine-tuning on domain-specific Wolof corpora. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
- Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records). |
|
|
- Mistranslations may occur, like any automated system. |
|
|
- Review recommended as the model can sometimes mistranslate. |
|
|
|
|
|
--- |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
- Training data is from a custom collection of parallel sentences (~84k pairs). |
|
|
- Some informal or culturally nuanced expressions may not be accurately translated. |
|
|
- Wolof spelling and grammar variation (Latin script) may lead to inconsistencies. |
|
|
- Model may underperform on domain-specific or long, complex texts. |
|
|
|
|
|
### Recommendations |
|
|
- Use human post-editing for high-stakes use cases. |
|
|
- Evaluate performance on your target domain before deployment. |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import MarianTokenizer, AutoModelForSeq2SeqLM |
|
|
|
|
|
model_name = "LOCALENLP/english-wolof" |
|
|
tokenizer = MarianTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_name) |
|
|
|
|
|
text = "Good evening, how was your day?" |
|
|
inputs = tokenizer(">>wol<< " + text, return_tensors="pt", padding=True, truncation=True) |
|
|
outputs = model.generate(**inputs, max_length=512, num_beams=4) |
|
|
translation = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
|
|
print("English:", text) |
|
|
print("Wolof:", translation) |
|
|
|