You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

localenlp-en-hau

Fine-tuned MarianMT model for English-to-Hausa translation.

Model Card for `LOCALENLP/eng-hau`

This is a machine translation model for English → Hausa, developed by the LOCALENLP organization.
It is based on the pretrained Helsinki-NLP/opus-mt-en-mul MarianMT model and fine-tuned on a custom parallel corpus of ~15k sentence pairs.

Model Details

Model Description

Developed by: Mgolo
Funded by [optional]: N/A
Shared by: Mgolo
Model type: Seq2Seq Transformer (MarianMT)
Languages: English → Hausa
License: MIT
Finetuned from model: Helsinki-NLP/opus-mt-en-mul

Model Sources

Repository: https://huggingface.co/LOCALENLP/eng_hau
Demo [optional]: To be integrated in Gradio / Web app

Uses

Direct Use

Translate English text into Hausa for research, education, and communication.
Useful for low-resource NLP tasks, digital content creation, and cultural preservation.

Downstream Use

Can be integrated into translation apps, chatbots, and education platforms.
Serves as a base for further fine-tuning on domain-specific Wolof corpora.

Out-of-Scope Use

Note Suitable for legal and medical translations (e.g., contracts, prescriptions, medical records).
Mistranslations may occur, like any automated system.
Review recommended as the model can sometimes mistranslate.

Bias, Risks, and Limitations

Training data is from a custom collection of parallel sentences (~15k pairs).
Some informal or culturally nuanced expressions may not be accurately translated.
Wolof spelling and grammar variation (Latin script) may lead to inconsistencies.
Model may underperform on domain-specific or long, complex texts.

Recommendations

Use human post-editing for high-stakes use cases.
Evaluate performance on your target domain before deployment.

How to Get Started with the Model

from transformers import MarianTokenizer, AutoModelForSeq2SeqLM

model_name = "LOCALENLP/eng_hau"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "Good evening, how was your day?"
inputs = tokenizer(">>hau<< " + text, return_tensors="pt", padding=True, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("English:", text)
print("Hausa:", translation)

Downloads last month: 2

Safetensors

Model size

77M params

Tensor type

F32

Evaluation results

BLEU on English-Hausa Custom Dataset
self-reported

39.000