File size: 3,195 Bytes
406d3d4 d7b48bf 406d3d4 d7b48bf 406d3d4 d7b48bf 406d3d4 d7b48bf 406d3d4 d7b48bf 406d3d4 d7b48bf 406d3d4 0ee3c0a 406d3d4 692e54b 406d3d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
language:
- en
- wo
license: mit
tags:
- translation
- machine-translation
- low-resource
- english
- wolof
datasets:
- custom
metrics:
- bleu
library_name: transformers
pipeline_tag: translation
model-index:
- name: localenlp-en-wol
results:
- task:
name: Translation
type: translation
dataset:
name: English-Wolof Custom Dataset
type: custom
size: 84k
metrics:
- name: BLEU
type: bleu
value: 76.12
---
# localenlp-en-wol
Fine-tuned MarianMT model for English-to-Wolof translation.
# Model Card for `LOCALENLP/english-wolof`
This is a machine translation model for **English → Wolof**, developed by the **LOCALENLP** organization.
It is based on the pretrained `Helsinki-NLP/opus-mt-en-mul` MarianMT model and fine-tuned on a custom parallel corpus of ~84k sentence pairs.
---
## Model Details
### Model Description
- **Developed by:** LOCALENLP
- **Funded by [optional]:** N/A
- **Shared by:** LOCALENLP
- **Model type:** Seq2Seq Transformer (MarianMT)
- **Languages:** English → Wolof
- **License:** MIT
- **Finetuned from model:** [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul)
### Model Sources
- **Repository:** https://huggingface.co/LOCALENLP/english-wolof
- **Demo [optional]:** [To be integrated in Gradio / Web app](https://huggingface.co/spaces/LocaleNLP/eng_wol)
---
## Uses
### Direct Use
- Translate English text into Wolof for research, education, and communication.
- Useful for low-resource NLP tasks, digital content creation, and cultural preservation.
### Downstream Use
- Can be integrated into translation apps, chatbots, and education platforms.
- Serves as a base for further fine-tuning on domain-specific Wolof corpora.
### Out-of-Scope Use
- Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records).
- Mistranslations may occur, like any automated system.
- Review recommended as the model can sometimes mistranslate.
---
## Bias, Risks, and Limitations
- Training data is from a custom collection of parallel sentences (~84k pairs).
- Some informal or culturally nuanced expressions may not be accurately translated.
- Wolof spelling and grammar variation (Latin script) may lead to inconsistencies.
- Model may underperform on domain-specific or long, complex texts.
### Recommendations
- Use human post-editing for high-stakes use cases.
- Evaluate performance on your target domain before deployment.
---
## How to Get Started with the Model
```python
from transformers import MarianTokenizer, AutoModelForSeq2SeqLM
model_name = "LOCALENLP/english-wolof"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = "Good evening, how was your day?"
inputs = tokenizer(">>wol<< " + text, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("English:", text)
print("Wolof:", translation)
|