LocaleNLP
/

eng_wolof

text2text-generation

machine-translation

Model card Files Files and versions

eng_wolof / README.md

Mgolo's picture

Update README.md

692e54b verified 5 months ago

|

history blame contribute delete

3.2 kB

	---
	language:
	- en
	- wo
	license: mit
	tags:
	- translation
	- machine-translation
	- low-resource
	- english
	- wolof
	datasets:
	- custom
	metrics:
	- bleu
	library_name: transformers
	pipeline_tag: translation
	model-index:
	- name: localenlp-en-wol
	results:
	- task:
	name: Translation
	type: translation
	dataset:
	name: English-Wolof Custom Dataset
	type: custom
	size: 84k
	metrics:
	- name: BLEU
	type: bleu
	value: 76.12
	---
	# localenlp-en-wol

	Fine-tuned MarianMT model for English-to-Wolof translation.

	# Model Card for `LOCALENLP/english-wolof`

	This is a machine translation model for English → Wolof, developed by the LOCALENLP organization.
	It is based on the pretrained `Helsinki-NLP/opus-mt-en-mul` MarianMT model and fine-tuned on a custom parallel corpus of ~84k sentence pairs.

	---

	## Model Details

	### Model Description
	- Developed by: LOCALENLP
	- Funded by [optional]: N/A
	- Shared by: LOCALENLP
	- Model type: Seq2Seq Transformer (MarianMT)
	- Languages: English → Wolof
	- License: MIT
	- Finetuned from model: [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul)

	### Model Sources
	- Repository: https://huggingface.co/LOCALENLP/english-wolof
	- Demo [optional]: [To be integrated in Gradio / Web app](https://huggingface.co/spaces/LocaleNLP/eng_wol)

	---

	## Uses

	### Direct Use
	- Translate English text into Wolof for research, education, and communication.
	- Useful for low-resource NLP tasks, digital content creation, and cultural preservation.

	### Downstream Use
	- Can be integrated into translation apps, chatbots, and education platforms.
	- Serves as a base for further fine-tuning on domain-specific Wolof corpora.

	### Out-of-Scope Use
	- Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records).
	- Mistranslations may occur, like any automated system.
	- Review recommended as the model can sometimes mistranslate.

	---

	## Bias, Risks, and Limitations
	- Training data is from a custom collection of parallel sentences (~84k pairs).
	- Some informal or culturally nuanced expressions may not be accurately translated.
	- Wolof spelling and grammar variation (Latin script) may lead to inconsistencies.
	- Model may underperform on domain-specific or long, complex texts.

	### Recommendations
	- Use human post-editing for high-stakes use cases.
	- Evaluate performance on your target domain before deployment.

	---

	## How to Get Started with the Model

	```python
	from transformers import MarianTokenizer, AutoModelForSeq2SeqLM

	model_name = "LOCALENLP/english-wolof"
	tokenizer = MarianTokenizer.from_pretrained(model_name)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

	text = "Good evening, how was your day?"
	inputs = tokenizer(">>wol<< " + text, return_tensors="pt", padding=True, truncation=True)
	outputs = model.generate(**inputs, max_length=512, num_beams=4)
	translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print("English:", text)
	print("Wolof:", translation)