File size: 3,195 Bytes

---
language:
  - en
  - wo
license: mit
tags:
  - translation
  - machine-translation
  - low-resource
  - english
  - wolof
datasets:
  - custom
metrics:
  - bleu
library_name: transformers
pipeline_tag: translation
model-index:
  - name: localenlp-en-wol
    results:
      - task:
          name: Translation
          type: translation
        dataset:
          name: English-Wolof Custom Dataset
          type: custom
          size: 84k
        metrics:
          - name: BLEU
            type: bleu
            value: 76.12
---
# localenlp-en-wol

Fine-tuned MarianMT model for English-to-Wolof translation.   

# Model Card for `LOCALENLP/english-wolof`

This is a machine translation model for **English → Wolof**, developed by the **LOCALENLP** organization.  
It is based on the pretrained `Helsinki-NLP/opus-mt-en-mul` MarianMT model and fine-tuned on a custom parallel corpus of ~84k sentence pairs.

---

## Model Details

### Model Description
- **Developed by:** LOCALENLP  
- **Funded by [optional]:** N/A  
- **Shared by:** LOCALENLP  
- **Model type:** Seq2Seq Transformer (MarianMT)  
- **Languages:** English → Wolof  
- **License:** MIT  
- **Finetuned from model:** [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul)  

### Model Sources
- **Repository:** https://huggingface.co/LOCALENLP/english-wolof  
- **Demo [optional]:** [To be integrated in Gradio / Web app](https://huggingface.co/spaces/LocaleNLP/eng_wol)  

---

## Uses

### Direct Use
- Translate English text into Wolof for research, education, and communication.  
- Useful for low-resource NLP tasks, digital content creation, and cultural preservation.  

### Downstream Use
- Can be integrated into translation apps, chatbots, and education platforms.  
- Serves as a base for further fine-tuning on domain-specific Wolof corpora.  

### Out-of-Scope Use
- Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records). 
- Mistranslations may occur, like any automated system. 
- Review recommended as the model can sometimes mistranslate. 

---

## Bias, Risks, and Limitations
- Training data is from a custom collection of parallel sentences (~84k pairs).  
- Some informal or culturally nuanced expressions may not be accurately translated.  
- Wolof spelling and grammar variation (Latin script) may lead to inconsistencies.  
- Model may underperform on domain-specific or long, complex texts.  

### Recommendations
- Use human post-editing for high-stakes use cases.  
- Evaluate performance on your target domain before deployment.  

---

## How to Get Started with the Model

```python
from transformers import MarianTokenizer, AutoModelForSeq2SeqLM

model_name = "LOCALENLP/english-wolof"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "Good evening, how was your day?"
inputs = tokenizer(">>wol<< " + text, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("English:", text)
print("Wolof:", translation)