--- language: - en - wo license: mit tags: - translation - machine-translation - low-resource - english - wolof datasets: - custom metrics: - bleu library_name: transformers pipeline_tag: translation model-index: - name: localenlp-en-wol results: - task: name: Translation type: translation dataset: name: English-Wolof Custom Dataset type: custom size: 84k metrics: - name: BLEU type: bleu value: 76.12 --- # localenlp-en-wol Fine-tuned MarianMT model for English-to-Wolof translation. # Model Card for `LOCALENLP/english-wolof` This is a machine translation model for **English → Wolof**, developed by the **LOCALENLP** organization. It is based on the pretrained `Helsinki-NLP/opus-mt-en-mul` MarianMT model and fine-tuned on a custom parallel corpus of ~84k sentence pairs. --- ## Model Details ### Model Description - **Developed by:** LOCALENLP - **Funded by [optional]:** N/A - **Shared by:** LOCALENLP - **Model type:** Seq2Seq Transformer (MarianMT) - **Languages:** English → Wolof - **License:** MIT - **Finetuned from model:** [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) ### Model Sources - **Repository:** https://huggingface.co/LOCALENLP/english-wolof - **Demo [optional]:** [To be integrated in Gradio / Web app](https://huggingface.co/spaces/LocaleNLP/eng_wol) --- ## Uses ### Direct Use - Translate English text into Wolof for research, education, and communication. - Useful for low-resource NLP tasks, digital content creation, and cultural preservation. ### Downstream Use - Can be integrated into translation apps, chatbots, and education platforms. - Serves as a base for further fine-tuning on domain-specific Wolof corpora. ### Out-of-Scope Use - Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records). - Mistranslations may occur, like any automated system. - Review recommended as the model can sometimes mistranslate. --- ## Bias, Risks, and Limitations - Training data is from a custom collection of parallel sentences (~84k pairs). - Some informal or culturally nuanced expressions may not be accurately translated. - Wolof spelling and grammar variation (Latin script) may lead to inconsistencies. - Model may underperform on domain-specific or long, complex texts. ### Recommendations - Use human post-editing for high-stakes use cases. - Evaluate performance on your target domain before deployment. --- ## How to Get Started with the Model ```python from transformers import MarianTokenizer, AutoModelForSeq2SeqLM model_name = "LOCALENLP/english-wolof" tokenizer = MarianTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) text = "Good evening, how was your day?" inputs = tokenizer(">>wol<< " + text, return_tensors="pt", padding=True, truncation=True) outputs = model.generate(**inputs, max_length=512, num_beams=4) translation = tokenizer.decode(outputs[0], skip_special_tokens=True) print("English:", text) print("Wolof:", translation)